Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-08 Thread Ola Fosheim Grøstad via Digitalmars-d
On Saturday, 8 September 2018 at 14:20:10 UTC, Laeeth Isharc 
wrote:
Religions have believers but not supporters - in fact saying 
you are a supporter says you are not a member of that faith or 
community.


If you are a supporter of Jesus Christ's efforts, then you most 
certainly are a christian. If you are a supporter of the Pope, 
then you may or not may be catholic, but you most likely are 
christian or a sympathise with the faith.


Programming languages are more like powertools. You may be a big 
fan of Makita and dislike using other powertools like Bosch and 
DeWalt, or you may have different preferences based the 
situation, or you may accept whatever you have at hand. Being a 
supporter is stretching it though... Although I am sure that 
people who only have Makita in their toolbox feel that they are 
supporting the company.


Social institutions need support to develop - language is a 
very old human institution, and programming languages have more 
similarity with natural languages alongst certain dimensions 
(I'm aware that NLP is your field) than some recognise.


Sounds like a fallacy.

So, why shouldn't a language have supporters?  I give some 
money to the D Foundation - this is called providing support.


If you hope to gain some kind of return for it or consequences 
that you benefit from then it is more like obtaining support and 
influence through providing funds. I.e. paying for support...


It's odd - if something isn't useful for me then either I just 
move on and find something that is, or I try to directly act 
myself or organise others to improve it so it is useful.  I 
don't stand there grumbling at the toolmakers whilst taking no 
positive action to make that change happen.


Pointing out that there is a problem, that needs to be solved, in 
order to reach a state where the tool is applicable in a 
production line... is not grumbling.  It is healthy.  Whether 
that leads to positive actions (changes in policies) can only be 
affected through politics, not "positive action".  Doesn't help 
to buy a new, bigger and better motor, if the transmission is 
broken.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-08 Thread Jonathan M Davis via Digitalmars-d
On Saturday, September 8, 2018 8:05:04 AM MDT Laeeth Isharc via Digitalmars-
d wrote:
> On Thursday, 6 September 2018 at 20:15:22 UTC, Jonathan M Davis
>
> wrote:
> > On Thursday, September 6, 2018 1:04:45 PM MDT aliak via
> >
> > Digitalmars-d wrote:
> >> D makes the code-point case default and hence that becomes the
> >> simplest to use. But unfortunately, the only thing I can think
> >> of
> >> that requires code point representations is when dealing
> >> specifically with unicode algorithms (normalization, etc).
> >> Here's
> >> a good read on code points:
> >> https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to
> >> -un icode-code-points/ -
> >>
> >> tl;dr: application logic does not need or want to deal with
> >> code points. For speed units work, and for correctness,
> >> graphemes work.
> >
> > I think that it's pretty clear that code points are objectively
> > the worst level to be the default. Unfortunately, changing it
> > to _anything_ else is not going to be an easy feat at this
> > point. But if we can first ensure that Phobos in general
> > doesn't rely on it (i.e. in general, it can deal with ranges of
> > char, wchar, dchar, or graphemes correctly rather than assuming
> > that all ranges of characters are ranges of dchar), then maybe
> > we can figure something out. Unfortunately, while some work has
> > been done towards that, what's mostly happened is that folks
> > have complained about auto-decoding without doing much to
> > improve the current situation. There's a lot more to this than
> > simply ripping out auto-decoding even if every D user on the
> > planet agreed that outright breaking almost every existing D
> > program to get rid of auto-decoding was worth it. But as with
> > too many things around here, there's a lot more talking than
> > working. And actually, as such, I should probably stop
> > discussing this and go do something useful.
>
> A tutorial page linked from the front page with some examples
> would go a long way to making it easier for people.  If I had
> time and understood strings enough to explain to others I would
> try to make a start, but unfortunately neither are true.

Writing up an article on proper Unicode handling in D is on my todo list,
but my todo list of things to do for D is long enough that I don't know then
I'm going to get to it.

> And if we are doing things right with RCString, then isn't it
> easier to make the change with that first - which is new so can't
> break code - and in some years when people are used to working
> that way update Phobos (compiler switch in beginning and have big
> transition a few years after that).

Well, I'm not actually convinced that what we have for RCString right now
_is_ doing the right thing, but even if it is, that doesn't fix the issue
that string doesn't do the right thing, and code needs to take that into
account - especially if it's generic code. The better job we do at making
Phobos code work with arbitrary ranges of characters, the less of an issue
that is, but you're still pretty much forced to deal with it in a number of
cases if you want your code to be efficient or if you want a function to be
able to accept a string and return a string rather than a wrapper range.
Using RCString in your code would reduce how much you had to worry about it,
but it doesn't completely solve the problem. And if you're doing stuff like
writing a library for other people to use, then you definitely can't just
ignore the issue. So, an RCString that handles Unicode sanely will
definitely help, but it's not really a fix. And plenty of code is still
going to be written to use strings (especially when -betterC is involved).
RCString is going to be another option, but it's not going to replace
string. Even if RCString became the most common string type to use (which I
question is going to ever happen), dynamic arrays of char, wchar, etc. are
still going to exist in the language and are still going to have to be
handled correctly.

Phobos won't be able to assume that all of the code out there is using
RCString and not string. The combination of improving Phobos so that it
works properly with ranges of characters in general (and not just strings or
ranges of dchar) and having an alternate string type that does the right
thing will definitely help and need to be done if we have any hope of
actually removing auto-decoding, but even with all of that, I don't see how
it would be possible to really deprecate the old behavior. We _might_ be
able to do something if we're willing to deprecate std.algorithm and
std.range (since std.range gives you the current definitions of the range
primitives for arrays, and std.algorithm publicly imports std.range), but
you still then have the problem of two different definitions of the range
primitives for arrays and all of the problems that that causes (even if it's
only for the deprecation period). So, strings would end up behaving
drastically differently with range-based 

Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-08 Thread Jonathan M Davis via Digitalmars-d
On Thursday, September 6, 2018 3:15:59 PM MDT aliak via Digitalmars-d wrote:
> On Thursday, 6 September 2018 at 20:15:22 UTC, Jonathan M Davis
>
> wrote:
> > On Thursday, September 6, 2018 1:04:45 PM MDT aliak via
> >
> > Digitalmars-d wrote:
> >> D makes the code-point case default and hence that becomes the
> >> simplest to use. But unfortunately, the only thing I can think
> >> of
> >> that requires code point representations is when dealing
> >> specifically with unicode algorithms (normalization, etc).
> >> Here's
> >> a good read on code points:
> >> https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to
> >> -un icode-code-points/ -
> >>
> >> tl;dr: application logic does not need or want to deal with
> >> code points. For speed units work, and for correctness,
> >> graphemes work.
> >
> > I think that it's pretty clear that code points are objectively
> > the worst level to be the default. Unfortunately, changing it
> > to _anything_ else is not going to be an easy feat at this
> > point. But if we can first ensure that Phobos in general
> > doesn't rely on it (i.e. in general, it can deal with ranges of
> > char, wchar, dchar, or graphemes correctly rather than assuming
> > that all ranges of characters are ranges of dchar), then maybe
> > we can figure something out. Unfortunately, while some work has
> > been done towards that, what's mostly happened is that folks
> > have complained about auto-decoding without doing much to
> > improve the current situation. There's a lot more to this than
> > simply ripping out auto-decoding even if every D user on the
> > planet agreed that outright breaking almost every existing D
> > program to get rid of auto-decoding was worth it. But as with
> > too many things around here, there's a lot more talking than
> > working. And actually, as such, I should probably stop
> > discussing this and go do something useful.
> >
> > - Jonathan M Davis
>
> Is there a unittest somewhere in phobos you know that one can be
> pointed to that shows the handling of these 4 variations you say
> should be dealt with first? Or maybe a PR that did some of this
> work that one could investigate?
>
> I ask so I can see in code what it means to make something not
> rely on autodecoding and deal with ranges of char, wchar, dchar
> or graphemes.
>
> Or a current "easy" bugzilla issue maybe that one could try a
> hand at?

Not really. The handling of this has generally been too ad-hoc. There are
plenty of examples of handling different string types, and there are a few
handling different ranges of character types, but there's a distinct lack of
tests involving graphemes. And the correct behavior for each is going to
depend on what exactly the function does - e.g. almost certainly, the
correct thing for filter to do is to not do anything special for ranges of
characters at all and just filter on the element type of the range (even
though it would almost always be incorrect to filter a range of char unless
it's known to be all ASCII), while on the other hand, find is clearly
designed to handle different encodings. So, it needs to be able to find a
dchar or grapheme in a range of char. And of course, there's the issue of
how normalization should be handled (if at all).

A number of the tests in std.utf and std.string do a good job of testing
Unicode strings of varying encodings, and std.utf does a good job overall of
testing ranges of char, wchar, and dchar which aren't strings, but I'm not
sure that anything in Phobos outside of std.uni currently does anything with
ranges of graphemes.

std.conv.to does have some tests for ranges of char, wchar, and dchar due to
a bug fix. e.g.

// bugzilla 15800
@safe unittest
{
import std.utf : byCodeUnit, byChar, byWchar, byDchar;

assert(to!int(byCodeUnit("10")) == 10);
assert(to!int(byCodeUnit("10"), 10) == 10);
assert(to!int(byCodeUnit("10"w)) == 10);
assert(to!int(byCodeUnit("10"w), 10) == 10);

assert(to!int(byChar("10")) == 10);
assert(to!int(byChar("10"), 10) == 10);
assert(to!int(byWchar("10")) == 10);
assert(to!int(byWchar("10"), 10) == 10);
assert(to!int(byDchar("10")) == 10);
assert(to!int(byDchar("10"), 10) == 10);
}

but there are no grapheme tests, and no Unicode characters are involved
(though I'm not sure that much in std.conv really needs to worry about
Unicode characters).

So, there are tests scattered all over the place which do pieces of what
they need to be doing, but I'm not sure that there are currently any that
test the full range of character ranges that they really need to be testing.
As with testing reference type ranges, such tests have generally been added
only when fixing a specific bug, and there hasn't been a sufficient effort
to just go through all of the affected functions and add appropriate tests.

And unfortunately, unlike with reference type ranges, the correct behavior
of a function when faced with ranges of different character types is going
to be 

Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-08 Thread Laeeth Isharc via Digitalmars-d

On Thursday, 6 September 2018 at 14:42:14 UTC, Chris wrote:
On Thursday, 6 September 2018 at 14:30:38 UTC, Guillaume Piolat 
wrote:

On Thursday, 6 September 2018 at 13:30:11 UTC, Chris wrote:
And autodecode is a good example of experts getting it wrong, 
because, you know, you cannot be an expert in all fields. I 
think the problem was that it was discovered too late.


There are very valid reasons not to talk about auto-decoding 
again:


- it's too late to remove because breakage
- attempts at removing it were _already_ tried
- it has been debated to DEATH
- there is an easy work-around

So any discussion _now_ would have the very same structure of 
the discussion _then_, and would lead to the exact same 
result. It's quite tragic. And I urge the real D supporters to 
let such conversation die (topics debated to death) as soon as 
they appear.


The real supporters? So it's a religion? For me it's about 
technology and finding a good tool for a job.


Religions have believers but not supporters - in fact saying you 
are a supporter says you are not a member of that faith or 
community.  I support the Catholic Church's efforts to relieve 
poverty in XYZ country - you're not a core part of that effort 
directly.


Social institutions need support to develop - language is a very 
old human institution, and programming languages have more 
similarity with natural languages alongst certain dimensions (I'm 
aware that NLP is your field) than some recognise.


So, why shouldn't a language have supporters?  I give some money 
to the D Foundation - this is called providing support.  Does 
that make me a zealot, or someone who confuses a computer 
programming language with a religion?  I don't think so.  I give 
money to the Foundation because it's a win-win.  It makes me 
happy to support the development of things that are beautiful and 
it's commercially a no-brainer because of the incidental benefits 
it brings.  Probably I would do so without those benefits, but on 
the other hand the best choices in life often end up solving 
problems you weren't even planning on solving and maybe didn't 
know you had.


Does that make me a monomaniac who thinks D should be used 
everywhere, and only D - the one true language?  I don't think 
so.  I confess to being excited by the possibility of writing web 
applications in D, but that has much more to do with Javascript 
and the ecosystem than it does D.  And on the other hand - even 
though I have supported the development of a Jupyter kernel for D 
(something that conceivably could make Julia less necessary) - 
I'm planning on doing more with Julia, because it's a better 
solution for some of our commercial problems than anything else I 
could find, including D.  Does using Julia mean we will write 
less D?  No - being able to do more work productively means 
writing more code, probably including more D, Python and C#.


I suggest the problem is in fact the entitlement of people who 
expect others to give them things for free without recognising 
that some appreciation would be in order, and that if one can 
helping in whatever way is possible is probably the right thing 
to do even if it's in a small way in the beginning.  This is of 
course a well-known challenge of open-source projects in general, 
but it's my belief it's a fleeting period already passing for D.


You know sometimes it's clear from the way someone argues that it 
isn't about what they say.  If the things they claim were 
problems were in fact anti-problems (merits) they would make 
different arguments but with the same emotional tone.


It's odd - if something isn't useful for me then either I just 
move on and find something that is, or I try to directly act 
myself or organise others to improve it so it is useful.  I don't 
stand there grumbling at the toolmakers whilst taking no positive 
action to make that change happen.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-08 Thread Laeeth Isharc via Digitalmars-d
On Thursday, 6 September 2018 at 20:15:22 UTC, Jonathan M Davis 
wrote:
On Thursday, September 6, 2018 1:04:45 PM MDT aliak via 
Digitalmars-d wrote:

D makes the code-point case default and hence that becomes the
simplest to use. But unfortunately, the only thing I can think 
of

that requires code point representations is when dealing
specifically with unicode algorithms (normalization, etc). 
Here's

a good read on code points:
https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-un
icode-code-points/ -

tl;dr: application logic does not need or want to deal with 
code points. For speed units work, and for correctness, 
graphemes work.


I think that it's pretty clear that code points are objectively 
the worst level to be the default. Unfortunately, changing it 
to _anything_ else is not going to be an easy feat at this 
point. But if we can first ensure that Phobos in general 
doesn't rely on it (i.e. in general, it can deal with ranges of 
char, wchar, dchar, or graphemes correctly rather than assuming 
that all ranges of characters are ranges of dchar), then maybe 
we can figure something out. Unfortunately, while some work has 
been done towards that, what's mostly happened is that folks 
have complained about auto-decoding without doing much to 
improve the current situation. There's a lot more to this than 
simply ripping out auto-decoding even if every D user on the 
planet agreed that outright breaking almost every existing D 
program to get rid of auto-decoding was worth it. But as with 
too many things around here, there's a lot more talking than 
working. And actually, as such, I should probably stop 
discussing this and go do something useful.


- Jonathan M Davis


A tutorial page linked from the front page with some examples 
would go a long way to making it easier for people.  If I had 
time and understood strings enough to explain to others I would 
try to make a start, but unfortunately neither are true.


And if we are doing things right with RCString, then isn't it 
easier to make the change with that first - which is new so can't 
break code - and in some years when people are used to working 
that way update Phobos (compiler switch in beginning and have big 
transition a few years after that).


Isn't this one of the challenges created by the tension between D 
being both a high-level and low-level language.  The higher the 
aim, the more problems you will encounter getting there.  That's 
okay.


And isn't the obstacle to breaking auto-decoding because it seems 
to be a monolithic challenge of overwhelming magnitude, whereas 
if we could figure out some steps to eat the elephant one 
mouthful at a time (which might mean start with RCString) then it 
will seem less intimidating.  It will take years anyway perhaps - 
but so what?





Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread RhyS via Digitalmars-d

On Thursday, 6 September 2018 at 17:19:01 UTC, Joakim wrote:
No, Swift counts grapheme clusters by default, so it gives 1. I 
suggest you read the linked Swift chapter above. I think it's 
the wrong choice for performance, but they chose to emphasize 
intuitiveness for the common case.


I like to point out that Swift spend a lot of time reworking how 
string are handled.


If my memory serves me well, they have reworked strings from 
version 2 to 3 and finalized it in version 4.


Swift 4 includes a faster, easier to use String implementation 
that retains Unicode correctness and adds support for creating, 
using and managing substrings.


That took them somewhere along the line of two years to get 
string handling to a acceptable and predictable state. And it 
annoyed the Swift user base greatly but a lot of changes got made 
to reaching a stable API.


Being honest, i personally find Swift a more easy languages 
despite it lacking IDE support on several platforms and no 
official Windows compiler.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread aliak via Digitalmars-d
On Thursday, 6 September 2018 at 20:15:22 UTC, Jonathan M Davis 
wrote:
On Thursday, September 6, 2018 1:04:45 PM MDT aliak via 
Digitalmars-d wrote:

D makes the code-point case default and hence that becomes the
simplest to use. But unfortunately, the only thing I can think 
of

that requires code point representations is when dealing
specifically with unicode algorithms (normalization, etc). 
Here's

a good read on code points:
https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-un
icode-code-points/ -

tl;dr: application logic does not need or want to deal with 
code points. For speed units work, and for correctness, 
graphemes work.


I think that it's pretty clear that code points are objectively 
the worst level to be the default. Unfortunately, changing it 
to _anything_ else is not going to be an easy feat at this 
point. But if we can first ensure that Phobos in general 
doesn't rely on it (i.e. in general, it can deal with ranges of 
char, wchar, dchar, or graphemes correctly rather than assuming 
that all ranges of characters are ranges of dchar), then maybe 
we can figure something out. Unfortunately, while some work has 
been done towards that, what's mostly happened is that folks 
have complained about auto-decoding without doing much to 
improve the current situation. There's a lot more to this than 
simply ripping out auto-decoding even if every D user on the 
planet agreed that outright breaking almost every existing D 
program to get rid of auto-decoding was worth it. But as with 
too many things around here, there's a lot more talking than 
working. And actually, as such, I should probably stop 
discussing this and go do something useful.


- Jonathan M Davis


Is there a unittest somewhere in phobos you know that one can be 
pointed to that shows the handling of these 4 variations you say 
should be dealt with first? Or maybe a PR that did some of this 
work that one could investigate?


I ask so I can see in code what it means to make something not 
rely on autodecoding and deal with ranges of char, wchar, dchar 
or graphemes.


Or a current "easy" bugzilla issue maybe that one could try a 
hand at?


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Jonathan M Davis via Digitalmars-d
On Thursday, September 6, 2018 1:04:45 PM MDT aliak via Digitalmars-d wrote:
> D makes the code-point case default and hence that becomes the
> simplest to use. But unfortunately, the only thing I can think of
> that requires code point representations is when dealing
> specifically with unicode algorithms (normalization, etc). Here's
> a good read on code points:
> https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-un
> icode-code-points/ -
>
> tl;dr: application logic does not need or want to deal with code
> points. For speed units work, and for correctness, graphemes work.

I think that it's pretty clear that code points are objectively the worst
level to be the default. Unfortunately, changing it to _anything_ else is
not going to be an easy feat at this point. But if we can first ensure that
Phobos in general doesn't rely on it (i.e. in general, it can deal with
ranges of char, wchar, dchar, or graphemes correctly rather than assuming
that all ranges of characters are ranges of dchar), then maybe we can figure
something out. Unfortunately, while some work has been done towards that,
what's mostly happened is that folks have complained about auto-decoding
without doing much to improve the current situation. There's a lot more to
this than simply ripping out auto-decoding even if every D user on the
planet agreed that outright breaking almost every existing D program to get
rid of auto-decoding was worth it. But as with too many things around here,
there's a lot more talking than working. And actually, as such, I should
probably stop discussing this and go do something useful.

- Jonathan M Davis





Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread aliak via Digitalmars-d

On Thursday, 6 September 2018 at 16:44:11 UTC, H. S. Teoh wrote:
On Thu, Sep 06, 2018 at 02:42:58PM +, Dukc via 
Digitalmars-d wrote:

On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote:
> // D
> auto a = "á";
> auto b = "á";
> auto c = "\u200B";
> auto x = a ~ c ~ a;
> auto y = b ~ c ~ b;
> 
> writeln(a.length); // 2 wtf

> writeln(b.length); // 3 wtf
> writeln(x.length); // 7 wtf
> writeln(y.length); // 9 wtf

[...]

This is an unfair comparison.  In the Swift version you used 
.count, but here you used .length, which is the length of the 
array, NOT the number of characters or whatever you expect it 
to be.  You should rather use .count and specify exactly what 
you want to count, e.g., byCodePoint or byGrapheme.


I suspect the Swift version will give you unexpected results if 
you did something like compare "á" to "a\u301", for example 
(which, in case it isn't obvious, are visually identical to 
each other, and as far as an end user is concerned, should only 
count as 1 grapheme).


Not even normalization will help you if you have a string like 
"a\u301\u302": in that case, the *only* correct way to count 
the number of visual characters is byGrapheme, and I highly 
doubt Swift's .count will give you the correct answer in that 
case. (I expect that Swift's .count will count code points, as 
is the usual default in many languages, which is unfortunately 
wrong when you're thinking about visual characters, which are 
called graphemes in Unicode parlance.)


And even in your given example, what should .count return when 
there's a zero-width character?  If you're counting the number 
of visual places taken by the string (e.g., you're trying to 
align output in a fixed-width terminal), then *both* versions 
of your code are wrong, because zero-width characters do not 
occupy any space when displayed. If you're counting the number 
of code points, though, e.g., to allocate the right buffer size 
to convert to dstring, then you want to count the zero-width 
character as 1 rather than 0.  And that's not to mention 
double-width characters, which should count as 2 if you're 
outputting to a fixed-width terminal.


Again I say, you need to know how Unicode works. Otherwise you 
can easily deceive yourself to think that your code (both in D 
and in Swift and in any other language) is correct, when in 
fact it will fail miserably when it receives input that you 
didn't think of.  Unicode is NOT ASCII, and you CANNOT assume 
there's a 1-to-1 mapping between "characters" and display 
length. Or 1-to-1 mapping between any of the various concepts 
of string "length", in fact.


In ASCII, array length == number of code points == number of 
graphemes == display width.


In Unicode, array length != number of code points != number of 
graphemes != display width.


Code written by anyone who does not understand this is WRONG, 
because you will inevitably end up using the wrong value for 
the wrong thing: e.g., array length for number of code points, 
or number of code points for display length. Not even 
.byGrapheme will save you here; you *need* to understand that 
zero-width and double-width characters exist, and what they 
imply for display width. You *need* to understand the 
difference between code points and graphemes.  There is no 
single default that will work in every case, because there are 
DIFFERENT CORRECT ANSWERS depending on what your code is trying 
to accomplish. Pretending that you can just brush all this 
detail under the rug of a single number is just deceiving 
yourself, and will inevitably result in wrong code that will 
fail to handle Unicode input correctly.



T


It's a totally fair comparison. .count in swift is the equivalent 
of .length in D, you use that to get the size of an array, etc. 
They've just implemented string.length as 
string.byGrapheme.walkLength. So it's intuitively correct (and 
yes, slower). If you didn't want the default though then you 
could also specify what "view" over characters you want. E.g.


let a = "á̂"
a.count // 1 <-- Yes it is exactly as expected.
a.unicodeScalars // 3
a.utf8.count // 5

I don't really see any issues with a zero-width character. If you 
want to deal with screen width (i.e. pixel space) that's not the 
same as how many characters are in a string. And it doesn't 
matter whether you go byGrapheme or byCodePoint or byCodeUnit 
because none of those represent a single column on screen. A 
zero-width character is 0 *width* but it's still *one* character. 
There's no .length/size/count in any language (that I've heard 
of) that'll give you your screen space from their string type. 
You query the font API for that as that depends on font size, 
kerning, style and face.


And again, I agree you need to know how unicode works. I don't 
argue that at all. I'm just saying that having the default be 
incorrect for application logic is just silly and when people 
have to do things like string.representation.normalize.byGrapheme 
or whatever 

Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Jonathan M Davis via Digitalmars-d
On Thursday, September 6, 2018 10:44:11 AM MDT H. S. Teoh via Digitalmars-d 
wrote:
> On Thu, Sep 06, 2018 at 02:42:58PM +, Dukc via Digitalmars-d wrote:
> > On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote:
> > > // D
> > > auto a = "á";
> > > auto b = "á";
> > > auto c = "\u200B";
> > > auto x = a ~ c ~ a;
> > > auto y = b ~ c ~ b;
> > >
> > > writeln(a.length); // 2 wtf
> > > writeln(b.length); // 3 wtf
> > > writeln(x.length); // 7 wtf
> > > writeln(y.length); // 9 wtf
>
> [...]
>
> This is an unfair comparison.  In the Swift version you used .count, but
> here you used .length, which is the length of the array, NOT the number
> of characters or whatever you expect it to be.  You should rather use
> .count and specify exactly what you want to count, e.g., byCodePoint or
> byGrapheme.
>
> I suspect the Swift version will give you unexpected results if you did
> something like compare "á" to "a\u301", for example (which, in case it
> isn't obvious, are visually identical to each other, and as far as an
> end user is concerned, should only count as 1 grapheme).
>
> Not even normalization will help you if you have a string like
> "a\u301\u302": in that case, the *only* correct way to count the number
> of visual characters is byGrapheme, and I highly doubt Swift's .count
> will give you the correct answer in that case. (I expect that Swift's
> .count will count code points, as is the usual default in many
> languages, which is unfortunately wrong when you're thinking about
> visual characters, which are called graphemes in Unicode parlance.)
>
> And even in your given example, what should .count return when there's a
> zero-width character?  If you're counting the number of visual places
> taken by the string (e.g., you're trying to align output in a
> fixed-width terminal), then *both* versions of your code are wrong,
> because zero-width characters do not occupy any space when displayed. If
> you're counting the number of code points, though, e.g., to allocate the
> right buffer size to convert to dstring, then you want to count the
> zero-width character as 1 rather than 0.  And that's not to mention
> double-width characters, which should count as 2 if you're outputting to
> a fixed-width terminal.
>
> Again I say, you need to know how Unicode works. Otherwise you can
> easily deceive yourself to think that your code (both in D and in Swift
> and in any other language) is correct, when in fact it will fail
> miserably when it receives input that you didn't think of.  Unicode is
> NOT ASCII, and you CANNOT assume there's a 1-to-1 mapping between
> "characters" and display length. Or 1-to-1 mapping between any of the
> various concepts of string "length", in fact.
>
> In ASCII, array length == number of code points == number of graphemes
> == display width.
>
> In Unicode, array length != number of code points != number of graphemes
> != display width.
>
> Code written by anyone who does not understand this is WRONG, because
> you will inevitably end up using the wrong value for the wrong thing:
> e.g., array length for number of code points, or number of code points
> for display length. Not even .byGrapheme will save you here; you *need*
> to understand that zero-width and double-width characters exist, and
> what they imply for display width. You *need* to understand the
> difference between code points and graphemes.  There is no single
> default that will work in every case, because there are DIFFERENT
> CORRECT ANSWERS depending on what your code is trying to accomplish.
> Pretending that you can just brush all this detail under the rug of a
> single number is just deceiving yourself, and will inevitably result in
> wrong code that will fail to handle Unicode input correctly.

Indeed. And unfortunately, the net result is that a large percentage of the
string-processing code out there is going to be wrong, and I don't think
that there's any way around that, because Unicode is simply too complicated
for the average programmer to understand it (sad as that may be) -
especially when most of them don't want to have to understand it.

Really, I'd say that there are only three options that even might be sane if
you really have the flexibility to design a proper solution:

1. Treat strings as ranges of code units by default.

2. Don't allow strings to be ranges, to be iterated, or indexed. They're
opaque types.

3. Treat strings as ranges of graphemes.

If strings are treated as ranges of code units by default (particularly if
they're UTF-8), you'll get failures very quickly if you're dealing with
non-ASCII, and you screw up the Unicode handling. It's also by far the most
performant solution and in many cases is exactly the right thing to do.
Obviously, something like byCodePoint or byGrapheme would then be needed in
the cases where code points or graphemes are the appropriate level to
iterate at.

If strings are opaque types (with ways to get ranges over code units, code
points, etc.), 

Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Joakim via Digitalmars-d

On Thursday, 6 September 2018 at 16:44:11 UTC, H. S. Teoh wrote:
On Thu, Sep 06, 2018 at 02:42:58PM +, Dukc via 
Digitalmars-d wrote:

On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote:
> // D
> auto a = "á";
> auto b = "á";
> auto c = "\u200B";
> auto x = a ~ c ~ a;
> auto y = b ~ c ~ b;
> 
> writeln(a.length); // 2 wtf

> writeln(b.length); // 3 wtf
> writeln(x.length); // 7 wtf
> writeln(y.length); // 9 wtf

[...]

This is an unfair comparison.  In the Swift version you used 
.count, but here you used .length, which is the length of the 
array, NOT the number of characters or whatever you expect it 
to be.  You should rather use .count and specify exactly what 
you want to count, e.g., byCodePoint or byGrapheme.


I suspect the Swift version will give you unexpected results if 
you did something like compare "á" to "a\u301", for example 
(which, in case it isn't obvious, are visually identical to 
each other, and as far as an end user is concerned, should only 
count as 1 grapheme).


Not even normalization will help you if you have a string like 
"a\u301\u302": in that case, the *only* correct way to count 
the number of visual characters is byGrapheme, and I highly 
doubt Swift's .count will give you the correct answer in that 
case. (I expect that Swift's .count will count code points, as 
is the usual default in many languages, which is unfortunately 
wrong when you're thinking about visual characters, which are 
called graphemes in Unicode parlance.)


No, Swift counts grapheme clusters by default, so it gives 1. I 
suggest you read the linked Swift chapter above. I think it's the 
wrong choice for performance, but they chose to emphasize 
intuitiveness for the common case.


I agree with most of the rest of what you wrote about programmers 
having no silver bullet to avoid Unicode's and languages' 
complexity.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread H. S. Teoh via Digitalmars-d
On Thu, Sep 06, 2018 at 02:42:58PM +, Dukc via Digitalmars-d wrote:
> On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote:
> > // D
> > auto a = "á";
> > auto b = "á";
> > auto c = "\u200B";
> > auto x = a ~ c ~ a;
> > auto y = b ~ c ~ b;
> > 
> > writeln(a.length); // 2 wtf
> > writeln(b.length); // 3 wtf
> > writeln(x.length); // 7 wtf
> > writeln(y.length); // 9 wtf
[...]

This is an unfair comparison.  In the Swift version you used .count, but
here you used .length, which is the length of the array, NOT the number
of characters or whatever you expect it to be.  You should rather use
.count and specify exactly what you want to count, e.g., byCodePoint or
byGrapheme.

I suspect the Swift version will give you unexpected results if you did
something like compare "á" to "a\u301", for example (which, in case it
isn't obvious, are visually identical to each other, and as far as an
end user is concerned, should only count as 1 grapheme).

Not even normalization will help you if you have a string like
"a\u301\u302": in that case, the *only* correct way to count the number
of visual characters is byGrapheme, and I highly doubt Swift's .count
will give you the correct answer in that case. (I expect that Swift's
.count will count code points, as is the usual default in many
languages, which is unfortunately wrong when you're thinking about
visual characters, which are called graphemes in Unicode parlance.)

And even in your given example, what should .count return when there's a
zero-width character?  If you're counting the number of visual places
taken by the string (e.g., you're trying to align output in a
fixed-width terminal), then *both* versions of your code are wrong,
because zero-width characters do not occupy any space when displayed. If
you're counting the number of code points, though, e.g., to allocate the
right buffer size to convert to dstring, then you want to count the
zero-width character as 1 rather than 0.  And that's not to mention
double-width characters, which should count as 2 if you're outputting to
a fixed-width terminal.

Again I say, you need to know how Unicode works. Otherwise you can
easily deceive yourself to think that your code (both in D and in Swift
and in any other language) is correct, when in fact it will fail
miserably when it receives input that you didn't think of.  Unicode is
NOT ASCII, and you CANNOT assume there's a 1-to-1 mapping between
"characters" and display length. Or 1-to-1 mapping between any of the
various concepts of string "length", in fact.

In ASCII, array length == number of code points == number of graphemes
== display width.

In Unicode, array length != number of code points != number of graphemes
!= display width.

Code written by anyone who does not understand this is WRONG, because
you will inevitably end up using the wrong value for the wrong thing:
e.g., array length for number of code points, or number of code points
for display length. Not even .byGrapheme will save you here; you *need*
to understand that zero-width and double-width characters exist, and
what they imply for display width. You *need* to understand the
difference between code points and graphemes.  There is no single
default that will work in every case, because there are DIFFERENT
CORRECT ANSWERS depending on what your code is trying to accomplish.
Pretending that you can just brush all this detail under the rug of a
single number is just deceiving yourself, and will inevitably result in
wrong code that will fail to handle Unicode input correctly.


T

-- 
It's amazing how careful choice of punctuation can leave you hanging:


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Daniel Kozak via Digitalmars-d
On Thu, Sep 6, 2018 at 4:45 PM Dukc via Digitalmars-d <
digitalmars-d@puremagic.com> wrote:

> On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote:
> > // D
> > auto a = "á";
> > auto b = "á";
> > auto c = "\u200B";
> > auto x = a ~ c ~ a;
> > auto y = b ~ c ~ b;
> >
> > writeln(a.length); // 2 wtf
> > writeln(b.length); // 3 wtf
> > writeln(x.length); // 7 wtf
> > writeln(y.length); // 9 wtf
> >
> > writeln(a == b); // false wtf
> > writeln("ááá".canFind("á")); // false wtf
> >
>
> I had to copy-paste that because I wondered how the last two can
> be false. They are because á is encoded differently. if you
> replace all occurences of it with a grapheme that fits to one
> code point, the results are:
>
> 2
> 2
> 7
> 7
> true
> true
>

import std.stdio;
import std.algorithm : canFind;
import std.uni : normalize;

void main()
{
auto a = "á".normalize;
auto b = "á".normalize;
auto c = "\u200B".normalize;
auto x = a ~ c ~ a;
auto y = b ~ c ~ b;

writeln(a.length); // 2
writeln(b.length); // 2
writeln(x.length); // 7
writeln(y.length); // 7

writeln(a == b); // true
writeln("ááá".canFind("á".normalize)); // true
}


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Guillaume Piolat via Digitalmars-d

On Thursday, 6 September 2018 at 14:42:14 UTC, Chris wrote:

Usually a sign to move on...


You have said that at least 10 times in this very thread. 
Doomsayers are as old as D. It will be doing OK.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Laurent Tréguier via Digitalmars-d

On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote:
Hehe, it's already a bit laughable that correctness is not 
preferred.


// Swift
let a = "á"
let b = "á"
let c = "\u{200B}" // zero width space
let x = a + c + a
let y = b + c + b

print(a.count) // 1
print(b.count) // 1
print(x.count) // 3
print(y.count) // 3

print(a == b) // true
print("ááá".range(of: "á") != nil) // true

// D
auto a = "á";
auto b = "á";
auto c = "\u200B";
auto x = a ~ c ~ a;
auto y = b ~ c ~ b;

writeln(a.length); // 2 wtf
writeln(b.length); // 3 wtf
writeln(x.length); // 7 wtf
writeln(y.length); // 9 wtf

writeln(a == b); // false wtf
writeln("ááá".canFind("á")); // false wtf


writeln(cast(ubyte[]) a); // [195, 161]
writeln(cast(ubyte[]) b); // [97, 204, 129]

At least for equality, it doesn't seem far fetched to me that 
both are not considered equal if they are not the same.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Chris via Digitalmars-d
On Thursday, 6 September 2018 at 14:30:38 UTC, Guillaume Piolat 
wrote:

On Thursday, 6 September 2018 at 13:30:11 UTC, Chris wrote:
And autodecode is a good example of experts getting it wrong, 
because, you know, you cannot be an expert in all fields. I 
think the problem was that it was discovered too late.


There are very valid reasons not to talk about auto-decoding 
again:


- it's too late to remove because breakage
- attempts at removing it were _already_ tried
- it has been debated to DEATH
- there is an easy work-around

So any discussion _now_ would have the very same structure of 
the discussion _then_, and would lead to the exact same result. 
It's quite tragic. And I urge the real D supporters to let such 
conversation die (topics debated to death) as soon as they 
appear.


The real supporters? So it's a religion? For me it's about 
technology and finding a good tool for a job.



why shouldn't users be allowed to give feedback?

Straw-man.


I meant in _general_, not necessarily autodecode ;)

If we don't get over _some_ technical debate, the only thing 
that is achieved is a loss of time for everyone involved.


Translation: "Nothing to see here, move along!" Usually a sign to 
move on...


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Dukc via Digitalmars-d

On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote:

// D
auto a = "á";
auto b = "á";
auto c = "\u200B";
auto x = a ~ c ~ a;
auto y = b ~ c ~ b;

writeln(a.length); // 2 wtf
writeln(b.length); // 3 wtf
writeln(x.length); // 7 wtf
writeln(y.length); // 9 wtf

writeln(a == b); // false wtf
writeln("ááá".canFind("á")); // false wtf



I had to copy-paste that because I wondered how the last two can 
be false. They are because á is encoded differently. if you 
replace all occurences of it with a grapheme that fits to one 
code point, the results are:


2
2
7
7
true
true


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Ola Fosheim Grøstad via Digitalmars-d
On Thursday, 6 September 2018 at 14:33:27 UTC, rikki cattermole 
wrote:
Either decide a list of conditions before we can break to 
remove it, or yes lets let this idea go. It isn't helping 
anyone.


Can't you just let mark it as deprecated and provide a library 
compatibility range (100% compatible). Then people will just 
update their code to use the range...


This should be possible to achieve using automated 
source-to-source translation in most cases.







Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread rikki cattermole via Digitalmars-d

On 07/09/2018 2:30 AM, Guillaume Piolat wrote:

On Thursday, 6 September 2018 at 13:30:11 UTC, Chris wrote:
And autodecode is a good example of experts getting it wrong, because, 
you know, you cannot be an expert in all fields. I think the problem 
was that it was discovered too late.


There are very valid reasons not to talk about auto-decoding again:

- it's too late to remove because breakage
- attempts at removing it were _already_ tried
- it has been debated to DEATH
- there is an easy work-around

So any discussion _now_ would have the very same structure of the 
discussion _then_, and would lead to the exact same result. It's quite 
tragic. And I urge the real D supporters to let such conversation die 
(topics debated to death) as soon as they appear.


+1
Either decide a list of conditions before we can break to remove it, or 
yes lets let this idea go. It isn't helping anyone.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Guillaume Piolat via Digitalmars-d

On Thursday, 6 September 2018 at 13:30:11 UTC, Chris wrote:
And autodecode is a good example of experts getting it wrong, 
because, you know, you cannot be an expert in all fields. I 
think the problem was that it was discovered too late.


There are very valid reasons not to talk about auto-decoding 
again:


- it's too late to remove because breakage
- attempts at removing it were _already_ tried
- it has been debated to DEATH
- there is an easy work-around

So any discussion _now_ would have the very same structure of the 
discussion _then_, and would lead to the exact same result. It's 
quite tragic. And I urge the real D supporters to let such 
conversation die (topics debated to death) as soon as they appear.





why shouldn't users be allowed to give feedback?

Straw-man.

If we don't get over _some_ technical debate, the only thing that 
is achieved is a loss of time for everyone involved.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread aliak via Digitalmars-d

On Wednesday, 5 September 2018 at 22:00:27 UTC, H. S. Teoh wrote:
Because grapheme decoding is SLOW, and most of the time you 
don't even need it anyway.  SLOW as in, it will easily add a 
factor of 3-5 (if not worse!) to your string processing time, 
which will make your natively-compiled D code a laughing stock 
of interpreted languages like Python.  It will make 
autodecoding look like an optimization(!).


Hehe, it's already a bit laughable that correctness is not 
preferred.


// Swift
let a = "á"
let b = "á"
let c = "\u{200B}" // zero width space
let x = a + c + a
let y = b + c + b

print(a.count) // 1
print(b.count) // 1
print(x.count) // 3
print(y.count) // 3

print(a == b) // true
print("ááá".range(of: "á") != nil) // true

// D
auto a = "á";
auto b = "á";
auto c = "\u200B";
auto x = a ~ c ~ a;
auto y = b ~ c ~ b;

writeln(a.length); // 2 wtf
writeln(b.length); // 3 wtf
writeln(x.length); // 7 wtf
writeln(y.length); // 9 wtf

writeln(a == b); // false wtf
writeln("ááá".canFind("á")); // false wtf

Tell me which one would cause the giggles again?

If speed is the preference over correctness (which I very much 
disagree with, but for arguments sake...) then still code points 
are the wrong choice. So, speed was obviously (??) not the reason 
to prefer code points as the default.


Here's a read on how swift 4 strings behave. Absolutely amazing 
work there: https://oleb.net/blog/2017/11/swift-4-strings/




Grapheme decoding is really only necessary when (1) you're 
typesetting a Unicode string, and (2) you're counting the 
number of visual characters taken up by the string (though 
grapheme counting even in this case may not give you what you 
want, thanks to double-width characters, zero-width characters, 
etc. -- though it can form the basis of correct counting code).


Yeah nah. Those are not the only 2 cases *ever* where grapheme 
decoding is correct. I don't think one can list all the cases 
where grapheme decoding is the correct behavior. Off the op of me 
head you've already forgotten comparisons. And on top of that, 
comparing and counting has a bajillion* use cases.


* number is an exaggeration.



For all other cases, you really don't need grapheme decoding, 
and being forced to iterate over graphemes when unnecessary 
will add a horrible overhead, worse than autodecoding does 
today.


As opposed to being forced to iterate with incorrect results? I 
understand that it's slower. I just don't think that justifies 
incorrect output. I agree with everything you've said next 
though, that people should understand unicode.




//

Seriously, people need to get over the fantasy that they can 
just use Unicode without understanding how Unicode works.  Most 
of the time, you can get the illusion that it's working, but 
actually 99% of the time the code is actually wrong and will do 
the wrong thing when given an unexpected (but still valid) 
Unicode string.  You can't drive without a license, and even if 
you try anyway, the chances of ending up in a nasty accident is 
pretty high.  People *need* to learn how to use Unicode 
properly before complaining about why this or that doesn't work 
the way they thought it should work.


I agree that you should know about unicode. And maybe you can't 
be correct 100% of the time but you can very well get much closer 
than were D is right now.


And yeah, you can't drive without a license, but most cars 
hopefully don't show you an incorrect speedometer reading because 
it produces faster drivers.





T
--
Gone Chopin. Bach in a minuet.


Lol :D



Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Chris via Digitalmars-d
On Thursday, 6 September 2018 at 11:01:55 UTC, Guillaume Piolat 
wrote:




So Unicode in D works EXACTLY as expected, yet people in this 
thread act as if the house is on fire.


Expected by who? The Unicode expert or the user?

D dying because of auto-decoding? Who can possibly think that 
in its right mind?


Nobody, it's just another major issue to be fixed.

The worst part of this forum is that suddenly everyone, by 
virtue of posting in a newsgroup, is an annointed language 
design expert.


Let me break that to you: core developer are language experts. 
The rest of us are users, that yes it doesn't make us 
necessarily qualified to design a language.


Calm down. I for my part never said I was an expert on language 
design.


Number one: experts do make mistakes too, there is nothing wrong 
with that. And autodecode is a good example of experts getting it 
wrong, because, you know, you cannot be an expert in all fields. 
I think the problem was that it was discovered too late.


Number two: why shouldn't users be allowed to give feedback? 
Engineers and developers need feedback, else we'd still be using 
CLI, wouldn't we. The user doesn't need to be an expert to know 
what s/he likes and doesn't like and developers / engineers often 
have a different point of view as to what is important / annoying 
etc. That's why IT companies introduced customer service, because 
the direct interaction between developers and users would often 
end badly (disgruntled customers).





Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Ola Fosheim Grøstad via Digitalmars-d
On Thursday, 6 September 2018 at 11:01:55 UTC, Guillaume Piolat 
wrote:
Let me break that to you: core developer are language experts. 
The rest of us are users, that yes it doesn't make us 
necessarily qualified to design a language.


Who?





Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Chris via Digitalmars-d

On Thursday, 6 September 2018 at 11:43:31 UTC, ag0aep6g wrote:


You say that D users shouldn't need a '"Unicode license" before 
they do anything with strings'. And you say that Python 3 gets 
it right (or maybe less wrong than D).


But here we see that Python requires a similar amount of 
Unicode knowledge. Without your Unicode license, you couldn't 
make sense of `len` giving different results for two strings 
that look the same.


So both D and Python require a Unicode license. But on top of 
that, D also requires an auto-decoding license. You need to 
know that `string` is both a range of code points and an array 
of code units. And you need to know that `.length` belongs to 
the array side, not the range side. Once you know that (and 
more), things start making sense in D.


You'll need some basic knowledge of Unicode, if you deal with 
strings, that's for sure. But you don't need a "license" and it 
certainly shouldn't be used as an excuse for D's confusing nature 
when it comes to strings. Unicode is confusing enough, so you 
don't need to add another layer of complexity to confuse users 
further. And most certainly you shouldn't blame the user for 
being confused. Afaik, there's no warning label with an 
accompanying user manual for string handling.


My point is: D doesn't require more Unicode knowledge than 
Python. But D's auto-decoding gives `string` a dual nature, and 
that can certainly be confusing. It's part of why everybody 
dislikes auto-decoding.


D should be clear about it. I think it's too late for `string` to 
change its behavior (i.e. "á".length = 1). If you wanna change 
`string`'s behavior now, maybe a compiler switch would be an 
option for the transition period: -autodecode=off.


Maybe a new type of string could be introduced that behaves like 
one would expect, say `ustring` for correct Unicode handling. Or 
`string` does that and you introduce a new type for high 
performance tasks (`rawstring` would unfortunately be confusing).


The thing is that even basic things like string handling are 
complicated and flawed so that I don't want to use D for any 
future projects and I don't have the time to wait until it gets 
fixed one day, if it ever will get fixed that is. Neither does it 
seem to be a priority as opposed to other things that are maybe 
less important for production. But at least I'm wiser after this 
thread, since it has been made clear that things are not gonna 
change soon, at least not soon enough for me.


This is why I'll file for D-vorce :) Will it be difficult? Maybe 
at the beginning, but it will make things easier in the long run. 
And at the end of the day, if you have to fix and rewrite parts 
of your code again and again due to frequent language changes, 
you might as well port it to a different PL altogether. But I 
have no hard feelings, it's a practical decision I had to make 
based on pros and cons.


[snip]




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread ag0aep6g via Digitalmars-d

On 09/06/2018 12:40 PM, Chris wrote:
To avoid this you have to normalize and recompose any decomposed 
characters. I remember that Mac OS X used (and still uses?) decomposed 
characters by default, so when you typed 'á' into your cli, it would 
automatically decompose it to 'a' + acute. `string` however returns 
len=2 for composed characters too. If you do a lot of string handling it 
will come back to bite you sooner or later.


You say that D users shouldn't need a '"Unicode license" before they do 
anything with strings'. And you say that Python 3 gets it right (or 
maybe less wrong than D).


But here we see that Python requires a similar amount of Unicode 
knowledge. Without your Unicode license, you couldn't make sense of 
`len` giving different results for two strings that look the same.


So both D and Python require a Unicode license. But on top of that, D 
also requires an auto-decoding license. You need to know that `string` 
is both a range of code points and an array of code units. And you need 
to know that `.length` belongs to the array side, not the range side. 
Once you know that (and more), things start making sense in D.


My point is: D doesn't require more Unicode knowledge than Python. But 
D's auto-decoding gives `string` a dual nature, and that can certainly 
be confusing. It's part of why everybody dislikes auto-decoding.


(Not saying that Python is free from such pitfalls. I simply don't know 
the language well enough.)


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Chris via Digitalmars-d

On Thursday, 6 September 2018 at 11:19:14 UTC, Chris wrote:



One problem imo is that they mixed the terms up: "Grapheme: A 
minimally distinctive unit of writing in the context of a 
particular writing system." In linguistics a grapheme is not a 
single character like "á" or "g". It may also be a combination 
of characters like in English spelling  ("s" + "h") that 
maps to a phoneme (e.g. ship, shut, shadow). In German this 
sound is written as  as in "Schiff" (ship) (but not 
always, cf. "s" in "Stange").




Sorry, this should read "In linguistics a grapheme is not 
_necessarily_ _only_ a single character like "á" or "g"."


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Chris via Digitalmars-d

On Thursday, 6 September 2018 at 10:44:45 UTC, Joakim wrote:
[snip]


You're not being fair here, Chris. I just saw this SO question 
that I think exemplifies how most programmers react to Unicode:


"Trying to understand the subtleties of modern Unicode is 
making my head hurt. In particular, the distinction between 
code points, characters, glyphs and graphemes - concepts which 
in the simplest case, when dealing with English text using 
ASCII characters, all have a one-to-one relationship with each 
other - is causing me trouble.


Seeing how these terms get used in documents like Matthias 
Bynens' JavaScript has a unicode problem or Wikipedia's piece 
on Han unification, I've gathered that these concepts are not 
the same thing and that it's dangerous to conflate them, but 
I'm kind of struggling to grasp what each term means.


The Unicode Consortium offers a glossary to explain this stuff, 
but it's full of "definitions" like this:


Abstract Character. A unit of information used for the 
organization, control, or representation of textual data. ...


...

Character. ... (2) Synonym for abstract character. (3) The 
basic unit of encoding for the Unicode character encoding. ...


...

Glyph. (1) An abstract form that represents one or more glyph 
images. (2) A synonym for glyph image. In displaying Unicode 
character data, one or more glyphs may be selected to depict a 
particular character.


...

Grapheme. (1) A minimally distinctive unit of writing in the 
context of a particular writing system. ...


Most of these definitions possess the quality of sounding very 
academic and formal, but lack the quality of meaning anything, 
or else defer the problem of definition to yet another glossary 
entry or section of the standard.


So I seek the arcane wisdom of those more learned than I. How 
exactly do each of these concepts differ from each other, and 
in what circumstances would they not have a one-to-one 
relationship with each other?"

https://stackoverflow.com/questions/27331819/whats-the-difference-between-a-character-a-code-point-a-glyph-and-a-grapheme

Honestly, unicode is a mess, and I believe we will all have to 
dump the Unicode standard and start over one day. Until that 
fine day, there is no neat solution to how to handle it, no 
matter how much you'd like to think so. Also, much of the 
complexity actually comes from the complexity of the various 
language alphabets, so that cannot be waved away no matter what 
standard you come up with, though Unicode certainly adds more 
unneeded complexity on top, which is why it should be dumped.


One problem imo is that they mixed the terms up: "Grapheme: A 
minimally distinctive unit of writing in the context of a 
particular writing system." In linguistics a grapheme is not a 
single character like "á" or "g". It may also be a combination of 
characters like in English spelling  ("s" + "h") that maps to 
a phoneme (e.g. ship, shut, shadow). In German this sound is 
written as  as in "Schiff" (ship) (but not always, cf. "s" 
in "Stange").


Since Unicode is such a difficult beast to deal with, I'd say D 
(or any PL for that matter) needs, first and foremost, a clear 
policy about what's the default behavior - not ad hoc patches. 
Then maybe a strategy as to how the default behavior can be 
turned on and off, say for performance reasons. One way _could_ 
be a compiler switch to turn the default behavior on/off -unicode 
or -uni or -utf8 or whatever, or maybe better a library solution 
like `ustring`.


If you need high performance and checks are no issue for the most 
part (web crawling, data harvesting etc), get rid of 
autodecoding. Once you need to check for character/grapheme 
correctness (e.g. translation tools) make it available through 
something like `to!ustring`. Which ever way: be clear about it. 
But don't let the unsuspecting user use `string` and get bitten 
by it.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Guillaume Piolat via Digitalmars-d

On Wednesday, 5 September 2018 at 07:48:34 UTC, Chris wrote:


import std.array : array;
import std.stdio : writefln;
import std.uni : byCodePoint, byGrapheme;
import std.utf : byCodeUnit;

void main() {

  string first = "á";

  writefln("%d", first.length);  // prints 2

  auto firstCU = "á".byCodeUnit; // type is `ByCodeUnitImpl` (!)

  writefln("%d", firstCU.length);  // prints 2

  auto firstGr = "á".byGrapheme.array;  // type is `Grapheme[]`

  writefln("%d", firstGr.length);  // prints 1

  auto firstCP = "á".byCodePoint.array; // type is `dchar[]`

  writefln("%d", firstCP.length);  // prints 1

  dstring second = "á";

  writefln("%d", second.length);  // prints 1 (That was easy!)

  // DMD64 D Compiler v2.081.2
}



So Unicode in D works EXACTLY as expected, yet people in this 
thread act as if the house is on fire.


D dying because of auto-decoding? Who can possibly think that in 
its right mind?


The worst part of this forum is that suddenly everyone, by virtue 
of posting in a newsgroup, is an annointed language design expert.


Let me break that to you: core developer are language experts. 
The rest of us are users, that yes it doesn't make us necessarily 
qualified to design a language.





Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Joakim via Digitalmars-d

On Thursday, 6 September 2018 at 09:35:27 UTC, Chris wrote:

On Thursday, 6 September 2018 at 08:44:15 UTC, nkm1 wrote:

On Wednesday, 5 September 2018 at 07:48:34 UTC, Chris wrote:
On Tuesday, 4 September 2018 at 21:36:16 UTC, Walter Bright 
wrote:


Autodecode - I've suffered under that, too. The solution was 
fairly simple. Append .byCodeUnit to strings that would 
otherwise autodecode. Annoying, but hardly a showstopper.


import std.array : array;
import std.stdio : writefln;
import std.uni : byCodePoint, byGrapheme;
import std.utf : byCodeUnit;

void main() {

  string first = "á";

  writefln("%d", first.length);  // prints 2

  auto firstCU = "á".byCodeUnit; // type is `ByCodeUnitImpl` 
(!)


  writefln("%d", firstCU.length);  // prints 2

  auto firstGr = "á".byGrapheme.array;  // type is 
`Grapheme[]`


  writefln("%d", firstGr.length);  // prints 1

  auto firstCP = "á".byCodePoint.array; // type is `dchar[]`

  writefln("%d", firstCP.length);  // prints 1

  dstring second = "á";

  writefln("%d", second.length);  // prints 1 (That was easy!)

  // DMD64 D Compiler v2.081.2
}


And this has what to do with autodecoding?


Nothing. I was just pointing out how awkward some basic things 
can be. autodecoding just adds to it in the sense that it's a 
useless overhead but will keep string handling in a limbo 
forever and ever and ever.




TBH, it looks like you're just confused about how Unicode 
works. None of that is something particular to D. You should 
probably address your concerns to the Unicode Consortium. Not 
that they care.


I'm actually not confused since I've been dealing with Unicode 
(and encodings in general) for quite a while now. Although I'm 
not a Unicode expert, I know what the operations above do and 
why. I'd only expect a modern PL to deal with Unicode correctly 
and have some guidelines as to the nitty-gritty.


Since you understand Unicode well, enlighten us: what's the best 
default format to use for string iteration?


You can argue that D chose the wrong default by having the stdlib 
auto-decode to code points in several places, and Walter and a 
host of the core D team would agree with you, and you can add me 
to the list too. But it's not clear there should be a default 
format at all, other than whatever you started off with, 
particularly for a programming language that values performance 
like D does, as each format choice comes with various speed vs. 
correctness trade-offs.


Therefore, the programmer has to understand that complexity and 
make his own choice. You're acting like there's some obvious 
choice for how to handle Unicode that we're missing here, when 
the truth is that _no programming language knows how to handle 
unicode well_, since handling a host of world languages in a 
single format is _inherently unintuitive_ and has significant 
efficiency tradeoffs between the different formats.


And once again, it's the user's fault as in having some basic 
assumptions about how things should work. The user is just too 
stpid to use D properly - that's all. I know this type of 
behavior from the management of pubs and shops that had to 
close down, because nobody would go there anymore.


Do you know the book "Crónica de una muerte anunciada" 
(Chronicle of a Death Foretold) by Gabriel García Márquez?


"The central question at the core of the novella is how the 
death of Santiago Nasar was foreseen, yet no one tried to stop 
it."[1]


[1] 
https://en.wikipedia.org/wiki/Chronicle_of_a_Death_Foretold#Key_themes


You're not being fair here, Chris. I just saw this SO question 
that I think exemplifies how most programmers react to Unicode:


"Trying to understand the subtleties of modern Unicode is making 
my head hurt. In particular, the distinction between code points, 
characters, glyphs and graphemes - concepts which in the simplest 
case, when dealing with English text using ASCII characters, all 
have a one-to-one relationship with each other - is causing me 
trouble.


Seeing how these terms get used in documents like Matthias 
Bynens' JavaScript has a unicode problem or Wikipedia's piece on 
Han unification, I've gathered that these concepts are not the 
same thing and that it's dangerous to conflate them, but I'm kind 
of struggling to grasp what each term means.


The Unicode Consortium offers a glossary to explain this stuff, 
but it's full of "definitions" like this:


Abstract Character. A unit of information used for the 
organization, control, or representation of textual data. ...


...

Character. ... (2) Synonym for abstract character. (3) The basic 
unit of encoding for the Unicode character encoding. ...


...

Glyph. (1) An abstract form that represents one or more glyph 
images. (2) A synonym for glyph image. In displaying Unicode 
character data, one or more glyphs may be selected to depict a 
particular character.


...

Grapheme. (1) A minimally distinctive unit of writing in the 
context of a particular writing system. ...



Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Chris via Digitalmars-d

On Thursday, 6 September 2018 at 10:22:22 UTC, ag0aep6g wrote:

On 09/06/2018 09:23 AM, Chris wrote:

Python 3 gives me this:

print(len("á"))
1


Python 3 also gives you this:

print(len("á"))
2

(The example might not survive transfer from me to you if 
Unicode normalization happens along the way.)


That's when you enter the 'á' as 'a' followed by U+0301 
(combining acute accent). So Python's `len` counts in code 
points, like D's std.range does (auto-decoding).


To avoid this you have to normalize and recompose any decomposed 
characters. I remember that Mac OS X used (and still uses?) 
decomposed characters by default, so when you typed 'á' into your 
cli, it would automatically decompose it to 'a' + acute. `string` 
however returns len=2 for composed characters too. If you do a 
lot of string handling it will come back to bite you sooner or 
later.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread ag0aep6g via Digitalmars-d

On 09/06/2018 09:23 AM, Chris wrote:

Python 3 gives me this:

print(len("á"))
1


Python 3 also gives you this:

print(len("á"))
2

(The example might not survive transfer from me to you if Unicode 
normalization happens along the way.)


That's when you enter the 'á' as 'a' followed by U+0301 (combining 
acute accent). So Python's `len` counts in code points, like D's 
std.range does (auto-decoding).


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Chris via Digitalmars-d

On Thursday, 6 September 2018 at 08:44:15 UTC, nkm1 wrote:

On Wednesday, 5 September 2018 at 07:48:34 UTC, Chris wrote:
On Tuesday, 4 September 2018 at 21:36:16 UTC, Walter Bright 
wrote:


Autodecode - I've suffered under that, too. The solution was 
fairly simple. Append .byCodeUnit to strings that would 
otherwise autodecode. Annoying, but hardly a showstopper.


import std.array : array;
import std.stdio : writefln;
import std.uni : byCodePoint, byGrapheme;
import std.utf : byCodeUnit;

void main() {

  string first = "á";

  writefln("%d", first.length);  // prints 2

  auto firstCU = "á".byCodeUnit; // type is `ByCodeUnitImpl` 
(!)


  writefln("%d", firstCU.length);  // prints 2

  auto firstGr = "á".byGrapheme.array;  // type is `Grapheme[]`

  writefln("%d", firstGr.length);  // prints 1

  auto firstCP = "á".byCodePoint.array; // type is `dchar[]`

  writefln("%d", firstCP.length);  // prints 1

  dstring second = "á";

  writefln("%d", second.length);  // prints 1 (That was easy!)

  // DMD64 D Compiler v2.081.2
}


And this has what to do with autodecoding?


Nothing. I was just pointing out how awkward some basic things 
can be. autodecoding just adds to it in the sense that it's a 
useless overhead but will keep string handling in a limbo forever 
and ever and ever.




TBH, it looks like you're just confused about how Unicode 
works. None of that is something particular to D. You should 
probably address your concerns to the Unicode Consortium. Not 
that they care.


I'm actually not confused since I've been dealing with Unicode 
(and encodings in general) for quite a while now. Although I'm 
not a Unicode expert, I know what the operations above do and 
why. I'd only expect a modern PL to deal with Unicode correctly 
and have some guidelines as to the nitty-gritty.


And once again, it's the user's fault as in having some basic 
assumptions about how things should work. The user is just too 
stpid to use D properly - that's all. I know this type of 
behavior from the management of pubs and shops that had to close 
down, because nobody would go there anymore.


Do you know the book "Crónica de una muerte anunciada" (Chronicle 
of a Death Foretold) by Gabriel García Márquez?


"The central question at the core of the novella is how the death 
of Santiago Nasar was foreseen, yet no one tried to stop it."[1]


[1] 
https://en.wikipedia.org/wiki/Chronicle_of_a_Death_Foretold#Key_themes


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread nkm1 via Digitalmars-d

On Wednesday, 5 September 2018 at 07:48:34 UTC, Chris wrote:
On Tuesday, 4 September 2018 at 21:36:16 UTC, Walter Bright 
wrote:


Autodecode - I've suffered under that, too. The solution was 
fairly simple. Append .byCodeUnit to strings that would 
otherwise autodecode. Annoying, but hardly a showstopper.


import std.array : array;
import std.stdio : writefln;
import std.uni : byCodePoint, byGrapheme;
import std.utf : byCodeUnit;

void main() {

  string first = "á";

  writefln("%d", first.length);  // prints 2

  auto firstCU = "á".byCodeUnit; // type is `ByCodeUnitImpl` (!)

  writefln("%d", firstCU.length);  // prints 2

  auto firstGr = "á".byGrapheme.array;  // type is `Grapheme[]`

  writefln("%d", firstGr.length);  // prints 1

  auto firstCP = "á".byCodePoint.array; // type is `dchar[]`

  writefln("%d", firstCP.length);  // prints 1

  dstring second = "á";

  writefln("%d", second.length);  // prints 1 (That was easy!)

  // DMD64 D Compiler v2.081.2
}


And this has what to do with autodecoding?



Welcome to my world!



TBH, it looks like you're just confused about how Unicode works. 
None of that is something particular to D. You should probably 
address your concerns to the Unicode Consortium. Not that they 
care.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Chris via Digitalmars-d

On Thursday, 6 September 2018 at 07:54:09 UTC, Joakim wrote:

On Thursday, 6 September 2018 at 07:23:57 UTC, Chris wrote:
On Wednesday, 5 September 2018 at 22:00:27 UTC, H. S. Teoh 
wrote:




//

Seriously, people need to get over the fantasy that they can 
just use Unicode without understanding how Unicode works.  
Most of the time, you can get the illusion that it's working, 
but actually 99% of the time the code is actually wrong and 
will do the wrong thing when given an unexpected (but still 
valid) Unicode string.  You can't drive without a license, 
and even if you try anyway, the chances of ending up in a 
nasty accident is pretty high.  People *need* to learn how to 
use Unicode properly before complaining about why this or 
that doesn't work the way they thought it should work.



T


Python 3 gives me this:

print(len("á"))
1

and so do other languages.


The same Python 3 that people criticize for having unintuitive 
unicode string handling?


https://learnpythonthehardway.org/book/nopython3.html

Is it asking too much to ask for `string` (not `dstring` or 
`wstring`) to behave as most people would expect it to behave 
in 2018 - and not like Python 2 from days of yore? But of 
course, D users should have a "Unicode license" before they do 
anything with strings. (I wonder is there a different license 
for UTF8 and UTF16 and UTF32, Big / Little Endian, BOM? Just 
asking.)


Yes and no, unicode is a clusterf***, so every programming 
language is having problems with it.


So again, for the umpteenth time, it's the users' fault. I 
see. Ironically enough, it was the language developers' lack 
of understanding of Unicode that led to string handling being 
a nightmare in D in the first place. Oh lads, if you were 
politicians I'd say that with this attitude you're gonna the 
next election. I say this, because many times the posts by 
(core) developers remind me so much of politicians who are 
completely detached from the reality of the people. Right oh!


You have a point that it was D devs' ignorance of unicode that 
led to the current auto-decoding problem. But let's have some 
nuance here, the problem ultimately is unicode.


Yes, Unicode is a beast that is hard to tame. But there is, 
afaik, not even a proper plan to tackle the whole thing in D, 
just patches. D has autodecoding which slows things down but 
doesn't even work correctly at the same time. However, it cannot 
be removed due to massive code breakage. So you sacrifice speed 
for security (fine) - but the security doesn't even exist. So 
what's the point? Also, there aren't any guidelines about how to 
use strings in different contexts. So after a while your code 
ends up being a mess of .byCodePoint / .byGrapheme / string / 
dstring whatever, and you never know if you really got it right 
or not (performance wise and other).


We're talking about a basic functionality like string handling. 
String handling is very important these days (data harvesting, 
translation tools) and IT is used all over the world where you 
have to deal with different alphabets that are outside the ASCII 
range. And because it's such a basic functionality, you don't 
want to waste time having to think about it.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread rikki cattermole via Digitalmars-d

On 06/09/2018 7:54 PM, Joakim wrote:

On Thursday, 6 September 2018 at 07:23:57 UTC, Chris wrote:

On Wednesday, 5 September 2018 at 22:00:27 UTC, H. S. Teoh wrote:



//

Seriously, people need to get over the fantasy that they can just use 
Unicode without understanding how Unicode works. Most of the time, 
you can get the illusion that it's working, but actually 99% of the 
time the code is actually wrong and will do the wrong thing when 
given an unexpected (but still valid) Unicode string.  You can't 
drive without a license, and even if you try anyway, the chances of 
ending up in a nasty accident is pretty high.  People *need* to learn 
how to use Unicode properly before complaining about why this or that 
doesn't work the way they thought it should work.



T


Python 3 gives me this:

print(len("á"))
1

and so do other languages.


The same Python 3 that people criticize for having unintuitive unicode 
string handling?


https://learnpythonthehardway.org/book/nopython3.html

Is it asking too much to ask for `string` (not `dstring` or `wstring`) 
to behave as most people would expect it to behave in 2018 - and not 
like Python 2 from days of yore? But of course, D users should have a 
"Unicode license" before they do anything with strings. (I wonder is 
there a different license for UTF8 and UTF16 and UTF32, Big / Little 
Endian, BOM? Just asking.)


Yes and no, unicode is a clusterf***, so every programming language is 
having problems with it.


So again, for the umpteenth time, it's the users' fault. I see. 
Ironically enough, it was the language developers' lack of 
understanding of Unicode that led to string handling being a nightmare 
in D in the first place. Oh lads, if you were politicians I'd say that 
with this attitude you're gonna the next election. I say this, because 
many times the posts by (core) developers remind me so much of 
politicians who are completely detached from the reality of the 
people. Right oh!


You have a point that it was D devs' ignorance of unicode that led to 
the current auto-decoding problem. But let's have some nuance here, the 
problem ultimately is unicode.


Let's also be realistic here, when D was being designed UTF-16 was 
touted as being 'the' solution you should support e.g. Java had it 
retrofitted shortly before D. So it isn't anyone's fault on D's end.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Joakim via Digitalmars-d

On Thursday, 6 September 2018 at 07:23:57 UTC, Chris wrote:
On Wednesday, 5 September 2018 at 22:00:27 UTC, H. S. Teoh 
wrote:




//

Seriously, people need to get over the fantasy that they can 
just use Unicode without understanding how Unicode works.  
Most of the time, you can get the illusion that it's working, 
but actually 99% of the time the code is actually wrong and 
will do the wrong thing when given an unexpected (but still 
valid) Unicode string.  You can't drive without a license, and 
even if you try anyway, the chances of ending up in a nasty 
accident is pretty high.  People *need* to learn how to use 
Unicode properly before complaining about why this or that 
doesn't work the way they thought it should work.



T


Python 3 gives me this:

print(len("á"))
1

and so do other languages.


The same Python 3 that people criticize for having unintuitive 
unicode string handling?


https://learnpythonthehardway.org/book/nopython3.html

Is it asking too much to ask for `string` (not `dstring` or 
`wstring`) to behave as most people would expect it to behave 
in 2018 - and not like Python 2 from days of yore? But of 
course, D users should have a "Unicode license" before they do 
anything with strings. (I wonder is there a different license 
for UTF8 and UTF16 and UTF32, Big / Little Endian, BOM? Just 
asking.)


Yes and no, unicode is a clusterf***, so every programming 
language is having problems with it.


So again, for the umpteenth time, it's the users' fault. I see. 
Ironically enough, it was the language developers' lack of 
understanding of Unicode that led to string handling being a 
nightmare in D in the first place. Oh lads, if you were 
politicians I'd say that with this attitude you're gonna the 
next election. I say this, because many times the posts by 
(core) developers remind me so much of politicians who are 
completely detached from the reality of the people. Right oh!


You have a point that it was D devs' ignorance of unicode that 
led to the current auto-decoding problem. But let's have some 
nuance here, the problem ultimately is unicode.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Paolo Invernizzi via Digitalmars-d

On Thursday, 6 September 2018 at 07:23:57 UTC, Chris wrote:

Seriously, people need to get over the fantasy that they can 
just use Unicode without understanding how Unicode works.  
Most of the time, you can get the illusion that it's working, 
but actually 99% of the time the code is actually wrong and 
will do the wrong thing when given an unexpected (but still 
valid) Unicode string.


Is it asking too much to ask for `string` (not `dstring` or 
`wstring`) to behave as most people would expect it to behave 
in 2018 - and not like Python 2 from days of yore?


I agree with Chris.

The boat is sailed, so D2 should just go full throttle with the 
original design and auto decode to graphemes, regardless of the 
performance.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-06 Thread Chris via Digitalmars-d

On Wednesday, 5 September 2018 at 22:00:27 UTC, H. S. Teoh wrote:



//

Seriously, people need to get over the fantasy that they can 
just use Unicode without understanding how Unicode works.  Most 
of the time, you can get the illusion that it's working, but 
actually 99% of the time the code is actually wrong and will do 
the wrong thing when given an unexpected (but still valid) 
Unicode string.  You can't drive without a license, and even if 
you try anyway, the chances of ending up in a nasty accident is 
pretty high.  People *need* to learn how to use Unicode 
properly before complaining about why this or that doesn't work 
the way they thought it should work.



T


Python 3 gives me this:

print(len("á"))
1

and so do other languages.

Is it asking too much to ask for `string` (not `dstring` or 
`wstring`) to behave as most people would expect it to behave in 
2018 - and not like Python 2 from days of yore? But of course, D 
users should have a "Unicode license" before they do anything 
with strings. (I wonder is there a different license for UTF8 and 
UTF16 and UTF32, Big / Little Endian, BOM? Just asking.)


So again, for the umpteenth time, it's the users' fault. I see. 
Ironically enough, it was the language developers' lack of 
understanding of Unicode that led to string handling being a 
nightmare in D in the first place. Oh lads, if you were 
politicians I'd say that with this attitude you're gonna the next 
election. I say this, because many times the posts by (core) 
developers remind me so much of politicians who are completely 
detached from the reality of the people. Right oh!







Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-05 Thread H. S. Teoh via Digitalmars-d
On Wed, Sep 05, 2018 at 09:33:27PM +, aliak via Digitalmars-d wrote:
[...]
> The dstring is only ok because the 2 code units fit in a dchar right?
> But all the other ones are as expected right?

And dstring will be wrong once you have non-precomposed diacritics and
other composing sequences.


> Seriously... why is it not graphemes by default for correctness
> whyyy!

Because grapheme decoding is SLOW, and most of the time you don't even
need it anyway.  SLOW as in, it will easily add a factor of 3-5 (if not
worse!) to your string processing time, which will make your
natively-compiled D code a laughing stock of interpreted languages like
Python.  It will make autodecoding look like an optimization(!).

Grapheme decoding is really only necessary when (1) you're typesetting a
Unicode string, and (2) you're counting the number of visual characters
taken up by the string (though grapheme counting even in this case may
not give you what you want, thanks to double-width characters,
zero-width characters, etc. -- though it can form the basis of correct
counting code).

For all other cases, you really don't need grapheme decoding, and being
forced to iterate over graphemes when unnecessary will add a horrible
overhead, worse than autodecoding does today.

//

Seriously, people need to get over the fantasy that they can just use
Unicode without understanding how Unicode works.  Most of the time, you
can get the illusion that it's working, but actually 99% of the time the
code is actually wrong and will do the wrong thing when given an
unexpected (but still valid) Unicode string.  You can't drive without a
license, and even if you try anyway, the chances of ending up in a nasty
accident is pretty high.  People *need* to learn how to use Unicode
properly before complaining about why this or that doesn't work the way
they thought it should work.


T

-- 
Gone Chopin. Bach in a minuet.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-05 Thread aliak via Digitalmars-d

On Wednesday, 5 September 2018 at 07:48:34 UTC, Chris wrote:
On Tuesday, 4 September 2018 at 21:36:16 UTC, Walter Bright 
wrote:


Autodecode - I've suffered under that, too. The solution was 
fairly simple. Append .byCodeUnit to strings that would 
otherwise autodecode. Annoying, but hardly a showstopper.


import std.array : array;
import std.stdio : writefln;
import std.uni : byCodePoint, byGrapheme;
import std.utf : byCodeUnit;

void main() {

  string first = "á";

  writefln("%d", first.length);  // prints 2

  auto firstCU = "á".byCodeUnit; // type is `ByCodeUnitImpl` (!)

  writefln("%d", firstCU.length);  // prints 2

  auto firstGr = "á".byGrapheme.array;  // type is `Grapheme[]`

  writefln("%d", firstGr.length);  // prints 1

  auto firstCP = "á".byCodePoint.array; // type is `dchar[]`

  writefln("%d", firstCP.length);  // prints 1

  dstring second = "á";

  writefln("%d", second.length);  // prints 1 (That was easy!)

  // DMD64 D Compiler v2.081.2
}

Welcome to my world!

[snip]


The dstring is only ok because the 2 code units fit in a dchar 
right? But all the other ones are as expected right?


Seriously... why is it not graphemes by default for correctness 
whyyy!




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-05 Thread Walter Bright via Digitalmars-d

On 9/4/2018 5:37 PM, bachmeier wrote:
Having to deal with the 
possibility that others might have any of twelve different compiler versions 
installed just isn't sustainable.


Back in the bad old DOS days, my compiler depended on the Microsoft linker, 
which was helpfully included on the DOS distribution disks (!)


The problem, however, was Microsoft kept changing the linker, and every linker 
was different. At one point I had my "linker disk" which was packed with every 
version of MS-Link I could find.


Now that was unsustainable.

The eventual solution was Bjorn Freeman-Benson wrote a linker (BLINK) which we 
then used. When it had a bug, we fixed it. When we shipped a compiler, it had a 
predictable linker with it. It made all the difference in the world.


Hence my penchant for "controlling our destiny" that I've remarked on now and 
then. It's also why the DMD toolchain is boost licensed - nobody is subject to 
our whims.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-05 Thread Chris via Digitalmars-d

On Tuesday, 4 September 2018 at 21:36:16 UTC, Walter Bright wrote:


Autodecode - I've suffered under that, too. The solution was 
fairly simple. Append .byCodeUnit to strings that would 
otherwise autodecode. Annoying, but hardly a showstopper.


import std.array : array;
import std.stdio : writefln;
import std.uni : byCodePoint, byGrapheme;
import std.utf : byCodeUnit;

void main() {

  string first = "á";

  writefln("%d", first.length);  // prints 2

  auto firstCU = "á".byCodeUnit; // type is `ByCodeUnitImpl` (!)

  writefln("%d", firstCU.length);  // prints 2

  auto firstGr = "á".byGrapheme.array;  // type is `Grapheme[]`

  writefln("%d", firstGr.length);  // prints 1

  auto firstCP = "á".byCodePoint.array; // type is `dchar[]`

  writefln("%d", firstCP.length);  // prints 1

  dstring second = "á";

  writefln("%d", second.length);  // prints 1 (That was easy!)

  // DMD64 D Compiler v2.081.2
}

Welcome to my world!

[snip]


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-04 Thread bachmeier via Digitalmars-d

On Friday, 24 August 2018 at 19:26:40 UTC, Walter Bright wrote:

On 8/24/2018 6:04 AM, Chris wrote:
For about a year I've had the feeling that D is moving too 
fast and going nowhere at the same time. D has to slow down 
and get stable. D is past the experimental stage. Too many 
people use it for real world programming and programmers value 
and _need_ both stability and consistency.


Every programmer who says this also demands new (and breaking) 
features.


I realize I'm responding to this discussion after a long time, 
but this is the first chance I've had to return to this thread...


What you write is correct. There's nothing wrong with wanting 
both change and stability, because there are right ways to change 
the language and wrong ways to change the language.


If you have a stable compiler release for which you know there 
will be no breaking changes for the next two years, you can 
distribute your code to someone else and know it will work. It's 
not unreasonable to say "Your compiler is three years old, you 
need to upgrade it." You will not receive a phone call from 
someone that doesn't know anything about D in the middle of your 
workday inquiring about why the program no longer compiles. 
Having to deal with the possibility that others might have any of 
twelve different compiler versions installed just isn't 
sustainable.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-04 Thread tide via Digitalmars-d

On Tuesday, 4 September 2018 at 21:36:16 UTC, Walter Bright wrote:

On 9/1/2018 4:12 AM, Chris wrote:
Hope is usually the last thing to die. But one has to be wise 
enough to see that sometimes there is nothing one can do. As 
things are now, for me personally D is no longer an option, 
because of simple basic things, like autodecode, a flaw that 
will be there forever, poor support for industry technologies 
(Android, iOS) and the constant "threat" of code breakage. The 
D language developers don't seem to understand the importance 
of these trivial matters. I'm not just opinionating, by now I 
have no other _choice_ but to look for alternatives - and I do 
feel a little bit sad.


Android, iOS - Contribute to help make it better.


It would help if the main official compiler supported those 
operating systems. That would mean adding ARM support to DMD. Or 
a much simpler solution, use an existing backend that has ARM 
support built in to it and is maintained by a much larger 
established group of individuals. Say like how some languages, 
like Rust, do.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-04 Thread Walter Bright via Digitalmars-d

On 9/4/2018 12:59 PM, Timon Gehr wrote:

 [...]


Thanks for the great explanation! Not sure I thoroughly understand it, though.

Therefore, D immutable/pure are both too strong and too weak: they prevent 
@system code from implementing value representations that internally use 
mutation (therefore D cannot implement its own runtime system, or alternatives 
to it), and it does not prevent pure @safe code from leaking reference 
identities of immutable value representations:


pure @safe naughty(immutable(int[]) xs){
     return cast(long)xs.ptr;
}

(In fact, it is equally bad that @safe weakly pure code can depend on the 
address of mutable data.)


Would it make sense to disallow such casts in pure code?

What other adjustments would you suggest?


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-04 Thread Walter Bright via Digitalmars-d

On 9/1/2018 4:12 AM, Chris wrote:
Hope is usually the last thing to die. But one has to be wise enough to see that 
sometimes there is nothing one can do. As things are now, for me personally D is 
no longer an option, because of simple basic things, like autodecode, a flaw 
that will be there forever, poor support for industry technologies (Android, 
iOS) and the constant "threat" of code breakage. The D language developers don't 
seem to understand the importance of these trivial matters. I'm not just 
opinionating, by now I have no other _choice_ but to look for alternatives - and 
I do feel a little bit sad.


Autodecode - I've suffered under that, too. The solution was fairly simple. 
Append .byCodeUnit to strings that would otherwise autodecode. Annoying, but 
hardly a showstopper.


Android, iOS - Contribute to help make it better.

Breakage - I've dealt with this, too. The language changes have been usually 
just some minor edits. The more serious problems were the removal of some Phobos 
packages. I dealt with this by creating the undeaD library:


https://github.com/dlang/undeaD


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-04 Thread Timon Gehr via Digitalmars-d

On 29.08.2018 22:01, Walter Bright wrote:

On 8/29/2018 10:50 AM, Timon Gehr wrote:
D const/immutable is stronger than immutability in Haskell (which is 
usually _lazy_).


I know Haskell is lazy, but don't see the connection with a weaker 
immutability guarantee.


In D, you can't have a lazy value within an immutable data structure 
(__mutable will fix this).



In any case, isn't immutability a precept of FP?


Yes, but it's at a higher level of abstraction. The important property 
of a (lazy) functional programming language is that a language term can 
be deterministically assigned a value for each concrete instance of an 
environment in which it is well-typed (i.e., values for all free 
variables of the term). Furthermore, the language semantics can be given 
as a rewrite system such that each rewrite performed by the system 
preserves the semantics of the rewritten term. I.e., terms change, but 
their values are preserved (immutable). [1]


To get this property, it is crucially important the functional 
programming system does not leak reference identities of the underlying 
value representations. This is sometimes called referential 
transparency. Immutability is a means to this end. (If references allow 
mutation, you can detect reference equality by modifying the underlying 
object through one reference and observing that the data accessed 
through some other reference changes accordingly.)


Under the hood, functional programming systems simulate term rewriting 
in some way, ultimately using mutable data structures. Similarly, in D, 
the garbage collector is allowed to change data that has been previously 
typed as immutable, and it can type-cast data that has been previously 
typed as mutable to immutable. However, it is impossible to write a GC 
or Haskell-like programs in D with pure functions operating on immutable 
data, because of constraints the language puts on user code that 
druntime is not subject to.


Therefore, D immutable/pure are both too strong and too weak: they 
prevent @system code from implementing value representations that 
internally use mutation (therefore D cannot implement its own runtime 
system, or alternatives to it), and it does not prevent pure @safe code 
from leaking reference identities of immutable value representations:


pure @safe naughty(immutable(int[]) xs){
return cast(long)xs.ptr;
}

(In fact, it is equally bad that @safe weakly pure code can depend on 
the address of mutable data.)




[1] E.g.:

(λa b. a + b) 2 3

and

10 `div` 2

are two terms whose semantics are given as the mathematical value 5.

During evaluation, terms change:

(λa b. a + b) 2 3 ⇝ 2 + 3 ⇝ 5
10 `div` 2 ⇝ 5

However, each intermediate term still represents the same value.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-02 Thread Chris via Digitalmars-d
On Saturday, 1 September 2018 at 21:18:27 UTC, Nick Sabalausky 
(Abscissa) wrote:

On 09/01/2018 07:12 AM, Chris wrote:


Hope is usually the last thing to die. But one has to be wise 
enough to see that sometimes there is nothing one can do. As 
things are now, for me personally D is no longer an option, 
because of simple basic things, like autodecode, a flaw that 
will be there forever, poor support for industry technologies 
(Android, iOS)


Much as I hate to agree, that IS one thing where I'm actually 
in the same boat:


My primary current paid project centers around converting some 
legacy Flash stuff to...well, to NOT Flash obviously. I *want* 
to use D for this very badly. But I'm not. I'm using Unity3D 
because:


1. For our use right now: It has ready-to-go out-of-the-box 
WebAsm support (or is it asm.js? Whatever...I can't keep up 
with the neverending torrent of rubble-bouncing from the web 
client world.)


2. For our use later: It has ready-to-go out-of-the-box 
iOS/Android support (along with just about any other platform 
we could ever possibly hope to care about).


3. It has all the robust multimedia functionality we need 
ready-to-go on all platforms (actually, its capabilities are 
totally overkill for us, but that's not a bad problem to have).


4. C# isn't completely totally horrible.

I will be migrating the server back-end to D, but I *really* 
wish I could be doing the client-side in D too, even if that 
meant having to build an entire 2D engine off nothing more than 
SDL.


Unfortunately, I just don't feel I can trust the D experience 
to be robust enough on those platforms right now, and I 
honestly have no idea when or even if it will get there (Maybe 
I'm wrong on that. I hope I am. But that IS my impression even 
as the HUUUGE D fan I am.)


"when or even if" I'm in the same situation but I can't wait 
anymore. Apps are everywhere these days and if you can't provide 
some sort of app, you're not in a good position. It's the realty 
of things, it's not a game, for many of us our jobs depend on it.


Btw, why did I get this message yesterday:

"Your message has been saved, and will be posted after being 
approved by a moderator."


My message hasn't shown up yet as it hasn't been approved yet ;)




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-01 Thread Nick Sabalausky (Abscissa) via Digitalmars-d

On 09/01/2018 07:12 AM, Chris wrote:


Hope is usually the last thing to die. But one has to be wise enough to 
see that sometimes there is nothing one can do. As things are now, for 
me personally D is no longer an option, because of simple basic things, 
like autodecode, a flaw that will be there forever, poor support for 
industry technologies (Android, iOS)


Much as I hate to agree, that IS one thing where I'm actually in the 
same boat:


My primary current paid project centers around converting some legacy 
Flash stuff to...well, to NOT Flash obviously. I *want* to use D for 
this very badly. But I'm not. I'm using Unity3D because:


1. For our use right now: It has ready-to-go out-of-the-box WebAsm 
support (or is it asm.js? Whatever...I can't keep up with the 
neverending torrent of rubble-bouncing from the web client world.)


2. For our use later: It has ready-to-go out-of-the-box iOS/Android 
support (along with just about any other platform we could ever possibly 
hope to care about).


3. It has all the robust multimedia functionality we need ready-to-go on 
all platforms (actually, its capabilities are totally overkill for us, 
but that's not a bad problem to have).


4. C# isn't completely totally horrible.

I will be migrating the server back-end to D, but I *really* wish I 
could be doing the client-side in D too, even if that meant having to 
build an entire 2D engine off nothing more than SDL. Unfortunately, I 
just don't feel I can trust the D experience to be robust enough on 
those platforms right now, and I honestly have no idea when or even if 
it will get there (Maybe I'm wrong on that. I hope I am. But that IS my 
impression even as the HUUUGE D fan I am.)


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-01 Thread Chris via Digitalmars-d

On Friday, 31 August 2018 at 18:24:40 UTC, Laeeth Isharc wrote:

On Friday, 31 August 2018 at 09:37:55 UTC, Chris wrote:
On Wednesday, 29 August 2018 at 23:47:11 UTC, Laeeth Isharc 
wrote:

On Tuesday, 28 August 2018 at 08:51:27 UTC, Chris wrote:





9. I hope D will be great again


Are you someone who lives by hope and fears about things that 
have a meaning for you?  Or do you prefer to take action?  If 
the latter, what do you think might be some small step you 
could take to move the world towards the direction in which you 
think it should head.  My experience of life is that in the end 
one way and another everything one does, big or small, turns 
out to matter and also that great things can have quite little 
beginnings.


So what could you do towards the end you hope for ?


Hope is usually the last thing to die. But one has to be wise 
enough to see that sometimes there is nothing one can do. As 
things are now, for me personally D is no longer an option, 
because of simple basic things, like autodecode, a flaw that will 
be there forever, poor support for industry technologies 
(Android, iOS) and the constant "threat" of code breakage. The D 
language developers don't seem to understand the importance of 
these trivial matters. I'm not just opinionating, by now I have 
no other _choice_ but to look for alternatives - and I do feel a 
little bit sad.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-09-01 Thread Chris via Digitalmars-d

On Friday, 31 August 2018 at 15:43:13 UTC, H. S. Teoh wrote:


[...]
I wasn't talking about that, but about the fact that users are 
slowly but surely nudged into a certain direction. And yes, D 
was advertised as a "no ideology language".


Sorry, "slowly but surely nudged" sounds very different from 
"forcing you into a new paradigm every 1 1/2 years".  So which 
is it?  A nudge, presumably from recommended practices which 
you don't really have to follow (e.g., I don't follow all D 
recommended practices in my own code), or a strong coercion 
that forces you to rewrite your code in a new paradigm or else?



T


Ah yeah, fair play to you. I knew I someone would see the force / 
nudge thing. You're nudged over the years until you end up being 
forced to use a certain paradigm. There's nothing wrong with 
languages "forcing" you to use certain paradigms as long as it's 
clear from the start and you know what you're in for. But moving 
the goalposts as you go along is a bit meh. I remember that 
Walter said that once he didn't care about (or even understand) 
templates. Then it was all templates, now it's functional 
programming (which I like). What will be next? Forced `assert` 
calls in every function? I can already see it... But, again, it's 
this attitude of nitpicking over words (nudge / force) instead of 
addressing the issues that alarms me. It's not a good sign.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-31 Thread Laeeth Isharc via Digitalmars-d

On Friday, 31 August 2018 at 09:37:55 UTC, Chris wrote:
On Wednesday, 29 August 2018 at 23:47:11 UTC, Laeeth Isharc 
wrote:

On Tuesday, 28 August 2018 at 08:51:27 UTC, Chris wrote:





9. I hope D will be great again


Are you someone who lives by hope and fears about things that 
have a meaning for you?  Or do you prefer to take action?  If the 
latter, what do you think might be some small step you could take 
to move the world towards the direction in which you think it 
should head.  My experience of life is that in the end one way 
and another everything one does, big or small, turns out to 
matter and also that great things can have quite little 
beginnings.


So what could you do towards the end you hope for ?




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-31 Thread H. S. Teoh via Digitalmars-d
On Fri, Aug 31, 2018 at 03:13:30PM +, Chris via Digitalmars-d wrote:
> On Friday, 31 August 2018 at 14:38:36 UTC, H. S. Teoh wrote:
> > On Fri, Aug 31, 2018 at 09:37:55AM +, Chris via Digitalmars-d wrote:
> > [...]
> > > 3. moving the goal posts all the time and forcing you into a new
> > > paradigm every 1 1/2 years (first it was "ranges", then
> > > "templates" and now it's "functional", wait OOP will come back one
> > > day).
> > [...]
> > 
> > Wait, what?  Since when has this ever been a "choose one paradigm
> > among many" deal?  Templates are what enables range-based idioms to
> > succeed, and ranges are what makes it possible to write
> > functional-like code in D.  Since when have they become mutually
> > exclusive?!
[...]
> I wasn't talking about that, but about the fact that users are slowly
> but surely nudged into a certain direction. And yes, D was advertised
> as a "no ideology language".

Sorry, "slowly but surely nudged" sounds very different from "forcing
you into a new paradigm every 1 1/2 years".  So which is it?  A nudge,
presumably from recommended practices which you don't really have to
follow (e.g., I don't follow all D recommended practices in my own
code), or a strong coercion that forces you to rewrite your code in a
new paradigm or else?


T

-- 
IBM = I'll Buy Microsoft!


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-31 Thread Chris via Digitalmars-d

On Friday, 31 August 2018 at 14:38:36 UTC, H. S. Teoh wrote:
On Fri, Aug 31, 2018 at 09:37:55AM +, Chris via 
Digitalmars-d wrote: [...]
3. moving the goal posts all the time and forcing you into a 
new paradigm every 1 1/2 years (first it was "ranges", then 
"templates" and now it's "functional", wait OOP will come back 
one day).

[...]

Wait, what?  Since when has this ever been a "choose one 
paradigm among many" deal?  Templates are what enables 
range-based idioms to succeed, and ranges are what makes it 
possible to write functional-like code in D.  Since when have 
they become mutually exclusive?!



T


I wasn't talking about that, but about the fact that users are 
slowly but surely nudged into a certain direction. And yes, D was 
advertised as a "no ideology language".


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-31 Thread H. S. Teoh via Digitalmars-d
On Fri, Aug 31, 2018 at 09:37:55AM +, Chris via Digitalmars-d wrote:
[...]
> 3. moving the goal posts all the time and forcing you into a new
> paradigm every 1 1/2 years (first it was "ranges", then "templates"
> and now it's "functional", wait OOP will come back one day).
[...]

Wait, what?  Since when has this ever been a "choose one paradigm among
many" deal?  Templates are what enables range-based idioms to succeed,
and ranges are what makes it possible to write functional-like code in
D.  Since when have they become mutually exclusive?!


T

-- 
Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-31 Thread Chris via Digitalmars-d

On Wednesday, 29 August 2018 at 23:47:11 UTC, Laeeth Isharc wrote:

On Tuesday, 28 August 2018 at 08:51:27 UTC, Chris wrote:




Julia is great.  I don't see it as a competitor to D but for us 
one way researchers might access libraries written in D.  One 
could do quite a lot in it, but I don't much fancy embedding 
Julia in Excel for example, though you could.  Or doing DevOps 
in Julia.  Perhaps more of a Matlab substitute.


Look around and you can find people grumpy about any language 
that's used.

http://www.zverovich.net/2016/05/13/giving-up-on-julia.html

Languages really aren't in a battle to the death with each 
other.

 I find this zero-sum mindset quite peculiar.


I'm old enough to a) not become enthusiastic about a language and 
b) know that you can find fault with any language. It's not about 
"life or death". D was promising and I liked it and it did things 
for me no other language could do for me - back in the day. 
Nowadays many languages have similar features, especially the 
useful ones that have proven to be, well, useful and not the 
latest fad. But D has some major issues that have become clear to 
me after using it for quite a while:


1. unsolved issues like autodecode that nobody seems to care about
2. obvious facepalm moments all over the place (see 1.)
3. moving the goal posts all the time and forcing you into a new 
paradigm every 1 1/2 years (first it was "ranges", then 
"templates" and now it's "functional", wait OOP will come back 
one day). Yeah, a language that doesn't come with a paradigm or 
ideology, no, a language that only nudges you into a certain 
direction and makes your code look old and just s "not 
modern" according to the latest CS fashion of the day. "Why do 
you complain? If you think C++ (as the D leadership did for a 
long time), of course your code will break, you knob! If it 
breaks it's for your own good (for now)."
4. nitpicking over details of half baked features that shouldn't 
be there in the first place, but hey! let's break valid existing 
code to fix them - or not - or, what about @volatileSafeUB (it's 
sooo not C++)? Yeah, sounds great. We'll just have to issue a 
compiler message "error: cannot assign `size_t` to `size_t`"
5. complete and utter negligence of developer reality (ARM, 
Android, iOS, tools etc.). It's all left to spare time 
enthusiasts - and their code will break in 4 weeks too. Just you 
wait and see
6. the leadership doesn't address the issues and gives evasive 
answers as in "Programmers who..." or on hindsight you're always 
wiser, other engineers have made mistakes too
7. I've seen it all before, many times, and it's a sign of a 
sinking ship, rearranging the deck chairs on the Titanic

8. what a pity
9. I hope D will be great again


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-30 Thread Nicholas Wilson via Digitalmars-d

On Thursday, 30 August 2018 at 23:03:57 UTC, Walter Bright wrote:

On 8/25/2018 4:49 PM, Nicholas Wilson wrote:
Run semantic3 on the constructor independent of the 
requirement to destruct already constructed objects. If the 
constructors is nothrow then there is no need to have the 
destructors run or the eh code at all, because no Exceptions 
can be thrown (an Error may be thrown but that will kill the 
program). This is how I intend to fix it after I refactor 
semantic3.


A function can be made nothrow by:

try {
   
} catch (Exception e) {
   ... handle it locally ...
}


Then I should have said: no exceptions can propagate, which is 
the real problem.


Also, your proposal is ignoring the destructors, which is 
literally what the compiler does now.


It was implicit in that the throwing case would call the 
destructors in the event of an exception (otherwise the bug ain't 
fixed). This formulation is to reduce the amount of breakage, 
which was the problem last time.


Yes this will break (as in code breakage) @safe ctors calling 
@system dtors but, such is life. The ctor probably shouldn't be 
throwing in the first place. I'll probably add -vthrowingctor and 
-vthrowingdtor as well since this will be a perf hit in the case 
of a throwing ctor.


Sorry for any confusion.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-30 Thread Walter Bright via Digitalmars-d

On 8/25/2018 4:49 PM, Nicholas Wilson wrote:
Run semantic3 on the constructor independent of the requirement to destruct 
already constructed objects. If the constructors is nothrow then there is no 
need to have the destructors run or the eh code at all, because no Exceptions 
can be thrown (an Error may be thrown but that will kill the program). This is 
how I intend to fix it after I refactor semantic3.


A function can be made nothrow by:

try {
   
} catch (Exception e) {
   ... handle it locally ...
}

Also, your proposal is ignoring the destructors, which is literally what the 
compiler does now.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-30 Thread Nick Sabalausky (Abscissa) via Digitalmars-d

On 08/29/2018 04:01 PM, Walter Bright wrote:

On 8/29/2018 10:50 AM, Timon Gehr wrote:
D const/immutable is stronger than immutability in Haskell (which is 
usually _lazy_).


I know Haskell is lazy, but don't see the connection with a weaker 
immutability guarantee. In any case, isn't immutability a precept of FP?


I think the point is that it disallows less, and permits more, all 
without breaking immutability.


Ie, lazy immutable *can* be changed, albiet once and only once in a very 
specific circumstance: When transitioning from uninitialized to 
initialized. AIUI, D only has this "the immutable is in-scope, but can 
still be initialized" state within constructors, whereas (it sounds 
like) Haskell allows it anywhere.


It's like strong-pure vs weak-pure: Both enforce the same purity 
guarantees, but weak-pure is less restrictive and more expressive.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-30 Thread Timon Gehr via Digitalmars-d

On 29.08.2018 21:58, Walter Bright wrote:

On 8/29/2018 11:02 AM, Timon Gehr wrote:
Absolutely. But D only strives to provide such automation in @safe 
code. For @system code, we need a formal specification of what is 
allowed. (And it needs to include all things that the GC and language 
do; no magic.) Note that such a formal specification is a prerequisite 
for any (possibly language-external) automated verification approaches.


I don't think that @system code is amenable to formal verification. 
After all, you can do UB in it, and it is the programmer's 
responsibility to ensure it works.


If it's amenable to informal verification, it is also amenable to formal 
verification. Computers can check mathematical proofs, and if the code 
is proven correct it does not contain UB. This is independent of whether 
D classifies the code as @safe or @system.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Laeeth Isharc via Digitalmars-d

On Monday, 27 August 2018 at 20:15:03 UTC, Ali wrote:

On Monday, 27 August 2018 at 19:51:52 UTC, 12345swordy wrote:

On Monday, 27 August 2018 at 18:20:04 UTC, Chris wrote:

Then the D Foundation should work on it.
Easier said then done. You can't go around demanding people to 
build factories without addressing the issues that comes with 
building factories, such as the big question of how is it 
going to be payed to be built.


-Alex


No one is (and no one should be) demanding anything, hoping 
maybe..


Walter, wants to build D, and he is doing what he can to 
continue building it

Andrei and many others joined him

If we are sharing our opinion, its not coming from any sense of 
entitlement, we are sharing our opinion, because the builders, 
provided the platform for us to voice our opinion


And again, because I keep repeating this, if they want more 
donations, I think talking more about the future plans will 
help, D currently neither have a larger user base, or an 
ambitious future plan, it make sense that they are not getting 
a lot of donations, they are not really making it attractive to 
donate


I think that most current donors are probably incentivized by 
negative factors, or negatively motivated, they are probably 
afraid D's Development will stop or they feel guilty for using 
the language and not providing much back


I dont think many donors are doing so because they are excited 
about the future


Nothing seriously wrong about negative motivation, it works, 
but positive motivation is  off


I donate to the D Foundation via my personal consulting company 
though it is listed under the name of Symmetry Investments.


I see that I am the second biggest donor after Andrei.  I think I 
can have more insight into my motivations than you can, and I can 
say that I am motivated by enthusiasm about commercial benefits 
and it wouldn't have occurred to me to donate out of fear, as you 
suggest.  If one makes a mistake I am in a business where the 
custom is that one fixes the mistake and moves on.  Suppose it 
were to turn out to have been a  mistake to use D.  Well I have 
made costlier mistakes then that this year and it's only August.  
And, as if happens, I don't think it was a mistake.


So you may think what you wish about the motivations of donors, 
but I think you might do well to base your views on evidence not 
imaginings if you wish to be taken seriously :)





Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Laeeth Isharc via Digitalmars-d

On Tuesday, 28 August 2018 at 08:51:27 UTC, Chris wrote:

On Tuesday, 28 August 2018 at 08:44:26 UTC, Chris wrote:


When people choose a programming language, there are several 
boxes that have to be ticked, like for example:


- what's the future of language X? (guarantees, stability)
- how easy is it to get going (from "Hello world" to a 
complete tool chain)

- will it run on ARM?
- will it be a good choice for the Web (e.g. webasm)?
- how good is it at data processing / number grinding
- etc.



I don't know if all their claims are 100% true, but let that 
sink in for a while:


https://julialang.org/.


Julia is great.  I don't see it as a competitor to D but for us 
one way researchers might access libraries written in D.  One 
could do quite a lot in it, but I don't much fancy embedding 
Julia in Excel for example, though you could.  Or doing DevOps in 
Julia.  Perhaps more of a Matlab substitute.


Look around and you can find people grumpy about any language 
that's used.

http://www.zverovich.net/2016/05/13/giving-up-on-julia.html

Languages really aren't in a battle to the death with each other. 
 I find this zero-sum mindset quite peculiar.






Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Walter Bright via Digitalmars-d

On 8/29/2018 10:50 AM, Timon Gehr wrote:
D const/immutable is stronger than immutability in Haskell (which is usually 
_lazy_).


I know Haskell is lazy, but don't see the connection with a weaker immutability 
guarantee. In any case, isn't immutability a precept of FP?


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Walter Bright via Digitalmars-d

On 8/29/2018 10:05 AM, Timon Gehr wrote:
This is a misunderstanding. The __mutable DIP will define the set of allowed 
program rewrites based on const/immutable/pure. Then code that uses __mutable 
must remain correct when they are applied. This achieves two things: it clearly 
defines the semantics of const/immutable/pure and (the possibility of) __mutable 
will not be an optimization blocker.


I'll get back to this once I have finished the tuple DIP implementation.


This is good news. I'm looking forward to both of them.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Walter Bright via Digitalmars-d

On 8/29/2018 11:02 AM, Timon Gehr wrote:
Absolutely. But D only strives to provide such automation in @safe code. For 
@system code, we need a formal specification of what is allowed. (And it needs 
to include all things that the GC and language do; no magic.) Note that such a 
formal specification is a prerequisite for any (possibly language-external) 
automated verification approaches.


I don't think that @system code is amenable to formal verification. After all, 
you can do UB in it, and it is the programmer's responsibility to ensure it works.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread tide via Digitalmars-d

On Wednesday, 29 August 2018 at 17:15:15 UTC, H. S. Teoh wrote:
Besides, this is missing the point.  What I meant was that if 
const could be arbitrarily overridden anywhere down the call 
chain, then the compiler could no longer feasibly verify that a 
particular piece of code doesn't violate const. The code could 
be calling a function for which the compiler has no source 
code, and who knows what that function might do. It could 
override const and modify the data willy-nilly, and if the 
const reference is pointing to an immutable object, you're in 
UB land.


Not allowing const to be overridden (without the user 
deliberately treading into UB land by casting it away) allows 
the compiler to statically check that the code doesn't actually 
modify a const object.


You appear to be thinking I was making a statement about 
verifying program correctness in general, which is taking what 
I said out of context.



T


You keep saying that, it has to be machine verifiable, but 
honestly I don't see the benefit to being machine verifiable. As 
in the case it can't verify the object doesn't change in scope at 
all, it can only verify that the code in scope doesn't modify it. 
I'd rather have C++ const and be useful than avoiding const 
almost completely.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread H. S. Teoh via Digitalmars-d
On Wed, Aug 29, 2018 at 07:02:42PM +, Dave Jones via Digitalmars-d wrote:
> On Wednesday, 29 August 2018 at 18:02:16 UTC, Timon Gehr wrote:
> > On 29.08.2018 19:15, H. S. Teoh wrote:
> > > On Wed, Aug 29, 2018 at 06:58:16PM +0200, Timon Gehr via
> > > Digitalmars-d wrote:
> > > > On 28.08.2018 19:02, H. S. Teoh wrote:
> > > > > On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via
> > > > > Digitalmars-d wrote:
> > > 
> > > Currently, immutable implicitly converts to const. If const is
> > > allowed to be overridden, then you could violate immutable, which
> > > is UB.  ...
> > 
> > __mutable fields are __mutable also in the immutable instance. You
> > might get into trouble with shared if you are not careful because of
> > the unfortunate "implicit shared" semantics of immutable, but it is
> > up to the programmer to get this right.
> 
> So you cant cast away const but you can specify a field stays mutable
> even if the aggregate is const or immutable?

That appears to be the case.  But it scares me that const(T) would no
longer guarantee you can't modify anything in T.  I fear it will break
some subtle assumptions about how const/immutable works, and introduce
hidden bugs to existing code.


T

-- 
Doubt is a self-fulfilling prophecy.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread H. S. Teoh via Digitalmars-d
On Wed, Aug 29, 2018 at 07:50:57PM +0200, Timon Gehr via Digitalmars-d wrote:
> On 28.08.2018 03:11, Walter Bright wrote:
> > On 8/27/2018 10:08 AM, H. S. Teoh wrote:
> > > Const in D makes sense as-is.  Though, granted, its infectiousness
> > > means its scope is actually very narrow, and as a result, we
> > > ironically can't use it in very many places, and so its touted
> > > benefits only rarely apply. :-(  Which also means that it's taking
> > > up a lot of language design real estate with not many benefits to
> > > show for it.
> > 
> > D const is of great utility if you're interested in functional
> > programming.
> 
> D const/immutable is stronger than immutability in Haskell (which is
> usually _lazy_).

This makes me wonder: is it possible to model a lazy immutable value in
D?

Likely not, if we were to take the immutability literally, since once
the variable is marked immutable and initialized, you couldn't change it
afterwards (without casting and UB).

We *might* be able to get away with a head-mutable reference to the
data, though. Say something like this:

struct LazyImmutable(T, alias initializer)
{
immutable(T)* impl;
@property T get() {
if (impl is null)
impl = initializer();
return *impl;
}
alias get this;
}

Seems rather cumbersome to use in practice, though.  And adds
indirection overhead to by-value types. One could possibly use emplace
to alleviate that, but still, the variable itself cannot be marked
immutable without breaking its functionality.

Which means you couldn't rely on such a wrapper type to work
transitively in complex types, unlike how immutable applies transitively
to all aggregate members: if T was an aggregate type, they couldn't be
LazyImmutable, but must be actually immutable.  Maybe this can be made
to work, but at the sacrifice of being unable to use built-in type
qualifiers like const/immutable.


T

-- 
Fact is stranger than fiction.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Dave Jones via Digitalmars-d

On Wednesday, 29 August 2018 at 18:02:16 UTC, Timon Gehr wrote:

On 29.08.2018 19:15, H. S. Teoh wrote:
On Wed, Aug 29, 2018 at 06:58:16PM +0200, Timon Gehr via 
Digitalmars-d wrote:

On 28.08.2018 19:02, H. S. Teoh wrote:
On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via 
Digitalmars-d wrote:



Currently, immutable implicitly converts to const. If const is 
allowed
to be overridden, then you could violate immutable, which is 
UB.

...


__mutable fields are __mutable also in the immutable instance. 
You might get into trouble with shared if you are not careful 
because of the unfortunate "implicit shared" semantics of 
immutable, but it is up to the programmer to get this right.


So you cant cast away const but you can specify a field stays 
mutable even if the aggregate is const or immutable?




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Timon Gehr via Digitalmars-d

On 29.08.2018 19:15, H. S. Teoh wrote:

On Wed, Aug 29, 2018 at 06:58:16PM +0200, Timon Gehr via Digitalmars-d wrote:

On 28.08.2018 19:02, H. S. Teoh wrote:

On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via Digitalmars-d 
wrote:
[...]

There are still valid use cases where const should be "broken".
One of them is mutex (another one caching). I have very little
experiance in multi-threaded programming, but what do you think
about "mutable" members, despite the object is const?

The problem with compromising const is that it would invalidate any
guarantees const may have provided.


No. You start with the set of allowed program rewrites, then require
code with __mutable to not break under them. Code using __mutable is
unsafe.


Currently, immutable implicitly converts to const. If const is allowed
to be overridden, then you could violate immutable, which is UB.
...


__mutable fields are __mutable also in the immutable instance. You might 
get into trouble with shared if you are not careful because of the 
unfortunate "implicit shared" semantics of immutable, but it is up to 
the programmer to get this right.





Const in D is not the same as const in languages like C++; const in
D means*physical*  const, as in, the data might reside in ROM where
it's physically impossible to modify.  Allowing the user to bypass
this means UB if the data exists in ROM.

Plus, the whole point of const in D is that it is
machine-verifiable, i.e., the compiler checks that the code does not
break const in any way and therefore you are guaranteed (barring
compiler bugs) that the data does not change.  If const were not
machine-verifiable, it would be nothing more than programming by
convention, since it would guarantee nothing.  Allowing const to be
"broken" somewhere would mean it's no longer machine-verifiable (you
need a human to verify whether the semantics are still correct).


It is not unusual to need a human to verify that your code does what
it was intended to do.


And it is not unusual for humans to make mistakes and certify code that
is not actually correct.  Automation provides much stronger guarantees
than human verification.
...


Absolutely. But D only strives to provide such automation in @safe code. 
For @system code, we need a formal specification of what is allowed. 
(And it needs to include all things that the GC and language do; no 
magic.) Note that such a formal specification is a prerequisite for any 
(possibly language-external) automated verification approaches.



Besides, this is missing the point.  What I meant was that if const
could be arbitrarily overridden anywhere down the call chain, then the
compiler could no longer feasibly verify that a particular piece of code
doesn't violate const. The code could be calling a function for which
the compiler has no source code, and who knows what that function might
do. It could override const and modify the data willy-nilly, and if the
const reference is pointing to an immutable object, you're in UB land.

Not allowing const to be overridden (without the user deliberately
treading into UB land by casting it away) allows the compiler to
statically check that the code doesn't actually modify a const object.

You appear to be thinking I was making a statement about verifying
program correctness in general, which is taking what I said out of
context.


T



I was thinking you were making a statement about __mutable fields.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Timon Gehr via Digitalmars-d

On 28.08.2018 03:11, Walter Bright wrote:

On 8/27/2018 10:08 AM, H. S. Teoh wrote:

Const in D makes sense as-is.  Though, granted, its infectiousness means
its scope is actually very narrow, and as a result, we ironically can't
use it in very many places, and so its touted benefits only rarely
apply. :-(  Which also means that it's taking up a lot of language
design real estate with not many benefits to show for it.


D const is of great utility if you're interested in functional programming.


D const/immutable is stronger than immutability in Haskell (which is 
usually _lazy_).


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Jonathan M Davis via Digitalmars-d
On Wednesday, August 29, 2018 11:15:15 AM MDT H. S. Teoh via Digitalmars-d 
wrote:
> On Wed, Aug 29, 2018 at 06:58:16PM +0200, Timon Gehr via Digitalmars-d 
wrote:
> > On 28.08.2018 19:02, H. S. Teoh wrote:
> > > On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via
> > > Digitalmars-d wrote: [...]
> > >
> > > > There are still valid use cases where const should be "broken".
> > > > One of them is mutex (another one caching). I have very little
> > > > experiance in multi-threaded programming, but what do you think
> > > > about "mutable" members, despite the object is const?
> > >
> > > The problem with compromising const is that it would invalidate any
> > > guarantees const may have provided.
> >
> > No. You start with the set of allowed program rewrites, then require
> > code with __mutable to not break under them. Code using __mutable is
> > unsafe.
>
> Currently, immutable implicitly converts to const. If const is allowed
> to be overridden, then you could violate immutable, which is UB.

If I understand correctly, the main reason behind looking to add __mutable
is to be able to do stuff with containers that can't currently be done where
you have a way of knowing whether the actual data is truly immutable or not
and thus can avoid mutating data that's actually immutable. That's still
undefined behavior right now, but presumably, if __mutable were added, it
would then be defined behavior that was highly @system and not intended for
normal code. However, even allowing that much does make it so that the
compiler can't then do any optimizations based on const, since while it may
be possible in some cases to avoid mutating immutable data when casting away
const and mutating, I don't see how it would be possible to guarantee that
it would be done in a way that could not possibly be screwed by up
optimizations made at a higher level based on the fact that the objects in
question are typed as const.

Basically, it seems that Andrei really wants a backdoor in const for certain
uses cases, so he's looking to find a way to enable it without really
putting a backdoor in const (IIRC it was discussed as part of this year's
dconf talk talking about the new containers that one of the students is
working on). I guess that he's managed to talk Timon into working on the
issue for him given Timon's excellent knowledge about the type system and
about the related computer science concepts. We'll see what they come up
with, but it's going to be _very_ difficult to make it so that you can
actually rely on const's guarantees if it has any kind of backdoors at all.
However, given some of the technical issues that they've run into with
allocators and containers, Andrei has been rather motivated to change the
status quo. We'll see what happens.

- Jonathan M Davis





Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread H. S. Teoh via Digitalmars-d
On Wed, Aug 29, 2018 at 06:58:16PM +0200, Timon Gehr via Digitalmars-d wrote:
> On 28.08.2018 19:02, H. S. Teoh wrote:
> > On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via Digitalmars-d 
> > wrote:
> > [...]
> > > There are still valid use cases where const should be "broken".
> > > One of them is mutex (another one caching). I have very little
> > > experiance in multi-threaded programming, but what do you think
> > > about "mutable" members, despite the object is const?
> > The problem with compromising const is that it would invalidate any
> > guarantees const may have provided.
> 
> No. You start with the set of allowed program rewrites, then require
> code with __mutable to not break under them. Code using __mutable is
> unsafe.

Currently, immutable implicitly converts to const. If const is allowed
to be overridden, then you could violate immutable, which is UB.


> > Const in D is not the same as const in languages like C++; const in
> > D means*physical*  const, as in, the data might reside in ROM where
> > it's physically impossible to modify.  Allowing the user to bypass
> > this means UB if the data exists in ROM.
> > 
> > Plus, the whole point of const in D is that it is
> > machine-verifiable, i.e., the compiler checks that the code does not
> > break const in any way and therefore you are guaranteed (barring
> > compiler bugs) that the data does not change.  If const were not
> > machine-verifiable, it would be nothing more than programming by
> > convention, since it would guarantee nothing.  Allowing const to be
> > "broken" somewhere would mean it's no longer machine-verifiable (you
> > need a human to verify whether the semantics are still correct).
> 
> It is not unusual to need a human to verify that your code does what
> it was intended to do.

And it is not unusual for humans to make mistakes and certify code that
is not actually correct.  Automation provides much stronger guarantees
than human verification.

Besides, this is missing the point.  What I meant was that if const
could be arbitrarily overridden anywhere down the call chain, then the
compiler could no longer feasibly verify that a particular piece of code
doesn't violate const. The code could be calling a function for which
the compiler has no source code, and who knows what that function might
do. It could override const and modify the data willy-nilly, and if the
const reference is pointing to an immutable object, you're in UB land.

Not allowing const to be overridden (without the user deliberately
treading into UB land by casting it away) allows the compiler to
statically check that the code doesn't actually modify a const object.

You appear to be thinking I was making a statement about verifying
program correctness in general, which is taking what I said out of
context.


T

-- 
It is not the employer who pays the wages. Employers only handle the money. It 
is the customer who pays the wages. -- Henry Ford


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Timon Gehr via Digitalmars-d

On 29.08.2018 03:59, Walter Bright wrote:
There's been some talk of adding a "mutable" qualifier for fields, which 
would stop the transitivity of const at that point. But it has problems, 
such as what happens with opaque types. The compiler can no longer check 
them, and hence will have to assume they contain mutable members.


This is a misunderstanding. The __mutable DIP will define the set of 
allowed program rewrites based on const/immutable/pure. Then code that 
uses __mutable must remain correct when they are applied. This achieves 
two things: it clearly defines the semantics of const/immutable/pure and 
(the possibility of) __mutable will not be an optimization blocker.


I'll get back to this once I have finished the tuple DIP implementation.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Timon Gehr via Digitalmars-d

On 28.08.2018 19:02, H. S. Teoh wrote:

On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via Digitalmars-d 
wrote:
[...]

There are still valid use cases where const should be "broken". One of
them is mutex (another one caching). I have very little experiance in
multi-threaded programming, but what do you think about "mutable"
members, despite the object is const?

The problem with compromising const is that it would invalidate any
guarantees const may have provided.


No. You start with the set of allowed program rewrites, then require 
code with __mutable to not break under them. Code using __mutable is unsafe.



Const in D is not the same as const
in languages like C++; const in D means*physical*  const, as in, the
data might reside in ROM where it's physically impossible to modify.
Allowing the user to bypass this means UB if the data exists in ROM.

Plus, the whole point of const in D is that it is machine-verifiable,
i.e., the compiler checks that the code does not break const in any way
and therefore you are guaranteed (barring compiler bugs) that the data
does not change.  If const were not machine-verifiable, it would be
nothing more than programming by convention, since it would guarantee
nothing.  Allowing const to be "broken" somewhere would mean it's no
longer machine-verifiable (you need a human to verify whether the
semantics are still correct).


It is not unusual to need a human to verify that your code does what it 
was intended to do.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Timon Gehr via Digitalmars-d

On 27.08.2018 11:14, Chris wrote:


It is unrealistic to assume that code will never break. But as I said in 
my post above, dmd should give guarantees of backward compatibility of 
at least N versions. Then we could be more relaxed about our code.


Each breaking change occurs between two adjacent compiler versions.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread H. S. Teoh via Digitalmars-d
On Wed, Aug 29, 2018 at 01:02:54AM +, tide via Digitalmars-d wrote:
[...]
> Point being, there is a huge difference between what you were saying,
> and what you are saying now. "This data never changes" is a much
> better guarantee and check than "this code does not modify this data".
> You use const to make sure the data doesn't change, if you can't
> guarantee it doesn't change from any other code then I wouldn't say it
> is machine-verifiable.

You appear to be still misunderstanding how it works.  In D, if you want
to make sure the data never changes, you use immutable.  Const is for
when you want to make sure a piece of code doesn't modify the data (even
if the data is mutable elsewhere).  Both are machine-verifiable. As in,
the compiler can verify that the code never touches the data.  Immutable
provides the strongest guarantee (no code anywhere modifies this data),
while const provides a weaker guarantee (this code doesn't modify this
data, but somebody else might).

The usefulness of const is that you can safely pass *both* mutable and
immutable data through it, and you're guaranteed there will be no
problems, because const does not allow the code to touch the data. If
the code does not need to touch the data, then it could take the data as
const, and you could use the same code to handle both mutable and
immutable data.

All of this breaks down if you allow const to be overridden anywhere.
That's why it's UB to cast away const.


> So we would need another qualifier "tantamount" to be implemented then
> it seems.

I don't understand what you mean by this. Could you clarify?
 

T

-- 
Живёшь только однажды.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-29 Thread Walter Bright via Digitalmars-d

On 8/26/2018 6:09 PM, Jonathan M Davis wrote:

I don't know what Walter's current plans are for what any built-in
ref-counting solution would look like, but it's my understanding that
whatever he was working on was put on hold, because he needed something like
DIP 1000 in order to make it work with @safe - which is what then triggered
his working on DIP 1000 like he has been. So, presumably, at some point
after DIP 1000 is complete and ready, he'll work on the ref-counting stuff
again. So, while we may very well get it, I expect that it will be a while.


DIP1000 is needed to make ref counting memory safe.

Andrei is working on ref counting, and he's concluded that copy constructors are 
a key feature to make them work. Postblits are, sadly, a great idea that are 
just a failure.


Hence, dip1000 and copy constructors (Razvan is working on that) are key 
technologies.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Tobias Müller via Digitalmars-d
H. S. Teoh  wrote:
> On Tue, Aug 28, 2018 at 10:20:06AM -0700, Manu via Digitalmars-d wrote:
> [...]
> Actually, I think C++ const is not very useful, because it guarantees
> nothing. At the most, it's just a sanity checker to make sure the
> programmer didn't accidentally do something dumb. But given an opaque
> C++ function that takes const parameters, there is ZERO guarantee that
> it doesn't actually modify stuff behind your back, and do so legally
> (per spec).

No, casting away const on pointers and references is only legal if the
object pointed to is actually mutable (not const). Everything else is UB.
Casting away const of a function parameter that is not under your control
will sooner or later lead to UB.

Tobi


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Manu via Digitalmars-d
On Tue, 28 Aug 2018 at 19:00, Walter Bright via Digitalmars-d
 wrote:
>
> There's been some talk of adding a "mutable" qualifier for fields, which would
> stop the transitivity of const at that point. But it has problems, such as 
> what
> happens with opaque types. The compiler can no longer check them, and hence 
> will
> have to assume they contain mutable members.

Exactly. And you arrive at C++.
'c-const' and 'turtles-const' probably need to be specified
differently from the top, not broken along the way with the likes of
mutable.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Manu via Digitalmars-d
On Tue, 28 Aug 2018 at 10:54, H. S. Teoh via Digitalmars-d
 wrote:
>
> On Tue, Aug 28, 2018 at 10:20:06AM -0700, Manu via Digitalmars-d wrote:
> [...]
> > The reality is though, that D's const is not actually very useful, and
> > C++'s const is.
>
> Actually, I think C++ const is not very useful, because it guarantees
> nothing. At the most, it's just a sanity checker to make sure the
> programmer didn't accidentally do something dumb.

I'd rate that as "pretty damn useful"™!

> But given an opaque
> C++ function that takes const parameters, there is ZERO guarantee that
> it doesn't actually modify stuff behind your back, and do so legally
> (per spec).

Well it can't modify the head-object... that's the point of head-const!

> I mean, how many times have you written const_cast<...>
> just to get a piece of code to compile?

Never in my life. That's a heinous crime. If it were removed from C++
and declared UB, I'd be fine with that.

> I know I've been guilty of this
> in many places, because it simply isn't worth the effort to track down
> all the places of the code that you need to fix to make it
> const-correct.  So basically, C++ const is nothing more than an
> annotation that isn't really enforced.

It could be enforced though. const_cast<> doesn't have to exist, and
`mutable` doesn't have to exist either.
That would strengthen C++'s design to make it more meaningful while
retaining a generally useful semantic.

That said, D's transitive const is a nice thing to be able to
express... I just recognise that it's mostly useless, and from that
perspective, I think being able to express the C++ meaning would be
useful, and certainly MORE useful.
I wonder if there's a design that could allow to express both options
selectively?

> But you're spot on about D's const, though.  While D's const *does*
> provide real guarantees (unless you tread into UB territory by casting
> it away), that also limits its scope so much that it's rarely useful
> outside of rather narrow confines.  Yet because it's so strict, using it
> requires investing significant effort.  So you end up with the
> unfortunate situation of "a lot of effort" + "limited usefulness" which
> for many people equals "not worth using".

And then that case of not being used (even if it could have) blocks
use somewhere else, and not(/unable-to)-const spreads like a virus >_<

> > D has no way to express head-const, and it turns out it's a
> > tremendously useful concept.
>
> I can live without head-const... but what *really* makes const painful
> for me is the lack of head-mutable. I.e., given a const container (which
> implies const objects), there is no universal way to obtain a mutable
> reference to said const objects,

... I think we're talking about the same thing.
In this context, the container is the 'head', and the elements would
be mutable beneath that unless declared const themselves.

> unless you tread into UB territory by
> forcefully casting it away.  This makes const so limited in
> applicability that, for the most part, I've given up using const at all,
> in spite of having tried quite hard to use it as much as possible for
> years.

Right. This appears to be the accepted recommendation for quite some
time, and no change in sight.
Tragically, the more people resign to this recommendation (and it's
practically official at this stage), the harder it becomes to use even
if you want to; any library code that you interact with that didn't
use const because 'recommendation' creates interaction blockages for
your own code, propagating can't-use-const into your client code,
despite your best intentions.

D's const is an objective failure. I don't think anyone could argue
otherwise with a straight face. It's sad but true; the surface area
and complexity of the feature absolutely doesn't justify its limited
(and actively waning) usefulness.



Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Walter Bright via Digitalmars-d
There's been some talk of adding a "mutable" qualifier for fields, which would 
stop the transitivity of const at that point. But it has problems, such as what 
happens with opaque types. The compiler can no longer check them, and hence will 
have to assume they contain mutable members.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Walter Bright via Digitalmars-d
Thanks, that's a good explanation of the point of the differences between const 
and immutable.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread tide via Digitalmars-d

On Tuesday, 28 August 2018 at 20:32:29 UTC, H. S. Teoh wrote:
On Tue, Aug 28, 2018 at 07:39:20PM +, tide via 
Digitalmars-d wrote:

On Tuesday, 28 August 2018 at 17:02:46 UTC, H. S. Teoh wrote:
> On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via 
> Digitalmars-d wrote: [...]
> > There are still valid use cases where const should be 
> > "broken". One of them is mutex (another one caching). I 
> > have very little experiance in multi-threaded programming, 
> > but what do you think about "mutable" members, despite the 
> > object is const?
> 
> The problem with compromising const is that it would 
> invalidate any guarantees const may have provided.  Const in 
> D is not the same as const in languages like C++; const in D 
> means *physical* const, as in, the data might reside in ROM 
> where it's physically impossible to modify.  Allowing the 
> user to bypass this means UB if the data exists in ROM.


I feel that such a narrow use case, wouldn't you just use 
something like immutable instead.


The problem is that immutable implicitly converts to const.  
Basically, const means "I guarantee I will never modify this 
data (though someone else might", and immutable means "nobody 
will ever modify this data". You cannot allow const to mutate 
without risking breakage with immutable.  If the original data 
came from a mutable reference, you can probably get away with 
casting const away. But if it came from an immutable object, 
casting const away is UB.  Allowing const to be "sometimes" 
modified is also UB.



> Plus, the whole point of const in D is that it is 
> machine-verifiable, i.e., the compiler checks that the code 
> does not break const in any way and therefore you are 
> guaranteed (barring compiler bugs) that the data does not 
> change.  If const were not machine-verifiable, it would be 
> nothing more than programming by convention, since it would 
> guarantee nothing.  Allowing const to be "broken" somewhere 
> would mean it's no longer machine-verifiable (you need a 
> human to verify whether the semantics are still correct).


This is still not true, it is not machine verifiable as it is. 
It can be bypassed quite easily, as a const object can be 
assigned from an non-const one. There's no way to offer that 
guarantee.


You misunderstand. Const means "this code cannot modify this 
object no matter what".  It does not guarantee somebody else 
can't modify it (you want immutable for that).  Both mutable 
and immutable implicitly convert to const, therefore it is 
imperative that code that handles const never modifies the 
data, because you don't know the provenance of the data: it 
could have come from an immutable object.  Allowing const to 
"sometimes" modify stuff will violate immutable and cause UB.


Whether a piece of code modifies the data is certainly 
machine-verifiable -- but only if there are no backdoors to 
const. If there are, then the compiler cannot feasibly verify 
const, since it would need to transitively examine all code 
called by the code in question, but the source code may not be 
always available.


Even if the data came from a mutable object, it does not make 
it any less machine-verifiable, since what we're verifying is 
"this code does not modify this data", not "this data never 
changes".  For the latter, immutable provides that guarantee, 
not const.  It is possible, for example, to obtain a const 
reference to a mutable object, and have one thread modify the 
object (via the mutable reference) while another thread reads 
it (via the const reference).  You cannot guarantee that the 
data itself won't change, but you *can* guarantee that the code 
holding the const reference (without access to the mutable 
reference) isn't the one making the changes.



T


Point being, there is a huge difference between what you were 
saying, and what you are saying now. "This data never changes" is 
a much better guarantee and check than "this code does not modify 
this data". You use const to make sure the data doesn't change, 
if you can't guarantee it doesn't change from any other code then 
I wouldn't say it is machine-verifiable.


So we would need another qualifier "tantamount" to be implemented 
then it seems.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread H. S. Teoh via Digitalmars-d
On Tue, Aug 28, 2018 at 06:44:37PM +, aliak via Digitalmars-d wrote:
> On Tuesday, 28 August 2018 at 17:53:36 UTC, H. S. Teoh wrote:
> > On Tue, Aug 28, 2018 at 10:20:06AM -0700, Manu via
> > > D has no way to express head-const, and it turns out it's a
> > > tremendously useful concept.
> > 
> > I can live without head-const... but what *really* makes const
> > painful for me is the lack of head-mutable. I.e., given a const
> > container (which implies const objects), there is no universal way
> > to obtain a mutable reference to said const objects, unless you
> > tread into UB territory by forcefully casting it away.  This makes
> > const so limited in applicability that, for the most part, I've
> > given up using const at all, in spite of having tried quite hard to
> > use it as much as possible for years.
> 
> Simen's opHeadMutable [0] was pretty good solution to this const range
> stuff, but for some reason (not specified by anyone in the thread) it
> didn't seem to catch on :/
> 
> [0] https://forum.dlang.org/post/zsaqtmvqmfkzhrmrm...@forum.dlang.org
[...]

Probably because nobody pushed it hard enough to make it happen.


T

-- 
It only takes one twig to burn down a forest.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread H. S. Teoh via Digitalmars-d
On Tue, Aug 28, 2018 at 07:39:20PM +, tide via Digitalmars-d wrote:
> On Tuesday, 28 August 2018 at 17:02:46 UTC, H. S. Teoh wrote:
> > On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via
> > Digitalmars-d wrote: [...]
> > > There are still valid use cases where const should be "broken".
> > > One of them is mutex (another one caching). I have very little
> > > experiance in multi-threaded programming, but what do you think
> > > about "mutable" members, despite the object is const?
> > 
> > The problem with compromising const is that it would invalidate any
> > guarantees const may have provided.  Const in D is not the same as
> > const in languages like C++; const in D means *physical* const, as
> > in, the data might reside in ROM where it's physically impossible to
> > modify.  Allowing the user to bypass this means UB if the data
> > exists in ROM.
> 
> I feel that such a narrow use case, wouldn't you just use something
> like immutable instead.

The problem is that immutable implicitly converts to const.  Basically,
const means "I guarantee I will never modify this data (though someone
else might", and immutable means "nobody will ever modify this data".
You cannot allow const to mutate without risking breakage with
immutable.  If the original data came from a mutable reference, you can
probably get away with casting const away. But if it came from an
immutable object, casting const away is UB.  Allowing const to be
"sometimes" modified is also UB.


> > Plus, the whole point of const in D is that it is
> > machine-verifiable, i.e., the compiler checks that the code does not
> > break const in any way and therefore you are guaranteed (barring
> > compiler bugs) that the data does not change.  If const were not
> > machine-verifiable, it would be nothing more than programming by
> > convention, since it would guarantee nothing.  Allowing const to be
> > "broken" somewhere would mean it's no longer machine-verifiable (you
> > need a human to verify whether the semantics are still correct).
> 
> This is still not true, it is not machine verifiable as it is. It can
> be bypassed quite easily, as a const object can be assigned from an
> non-const one. There's no way to offer that guarantee.

You misunderstand. Const means "this code cannot modify this object no
matter what".  It does not guarantee somebody else can't modify it (you
want immutable for that).  Both mutable and immutable implicitly convert
to const, therefore it is imperative that code that handles const never
modifies the data, because you don't know the provenance of the data: it
could have come from an immutable object.  Allowing const to "sometimes"
modify stuff will violate immutable and cause UB.

Whether a piece of code modifies the data is certainly
machine-verifiable -- but only if there are no backdoors to const. If
there are, then the compiler cannot feasibly verify const, since it
would need to transitively examine all code called by the code in
question, but the source code may not be always available.

Even if the data came from a mutable object, it does not make it any
less machine-verifiable, since what we're verifying is "this code does
not modify this data", not "this data never changes".  For the latter,
immutable provides that guarantee, not const.  It is possible, for
example, to obtain a const reference to a mutable object, and have one
thread modify the object (via the mutable reference) while another
thread reads it (via the const reference).  You cannot guarantee that
the data itself won't change, but you *can* guarantee that the code
holding the const reference (without access to the mutable reference)
isn't the one making the changes.


T

-- 
A program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes the potential for it to be applied to tasks that are
conceptually similar and, more important, to tasks that have not yet
been conceived. -- Michael B. Allen


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread tide via Digitalmars-d

On Tuesday, 28 August 2018 at 17:02:46 UTC, H. S. Teoh wrote:
On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via 
Digitalmars-d wrote: [...]
There are still valid use cases where const should be 
"broken". One of them is mutex (another one caching). I have 
very little experiance in multi-threaded programming, but what 
do you think about "mutable" members, despite the object is 
const?


The problem with compromising const is that it would invalidate 
any guarantees const may have provided.  Const in D is not the 
same as const in languages like C++; const in D means 
*physical* const, as in, the data might reside in ROM where 
it's physically impossible to modify. Allowing the user to 
bypass this means UB if the data exists in ROM.


I feel that such a narrow use case, wouldn't you just use 
something like immutable instead.


Plus, the whole point of const in D is that it is 
machine-verifiable, i.e., the compiler checks that the code 
does not break const in any way and therefore you are 
guaranteed (barring compiler bugs) that the data does not 
change.  If const were not machine-verifiable, it would be 
nothing more than programming by convention, since it would 
guarantee nothing.  Allowing const to be "broken" somewhere 
would mean it's no longer machine-verifiable (you need a human 
to verify whether the semantics are still correct).


This is still not true, it is not machine verifiable as it is. It 
can be bypassed quite easily, as a const object can be assigned 
from an non-const one. There's no way to offer that guarantee.


import std.format : format;

struct Type
{
int value;
}

void test(const ref Type type, int* ptr)
{
int first = type.value;

*ptr = first + 1;

assert(type.value == first, format!"%d != %d"(type.value, 
first));

}

void main()
{
Type type = Type(10);
test(type, );
}


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread aliak via Digitalmars-d

On Tuesday, 28 August 2018 at 17:53:36 UTC, H. S. Teoh wrote:

On Tue, Aug 28, 2018 at 10:20:06AM -0700, Manu via
D has no way to express head-const, and it turns out it's a 
tremendously useful concept.


I can live without head-const... but what *really* makes const 
painful for me is the lack of head-mutable. I.e., given a const 
container (which implies const objects), there is no universal 
way to obtain a mutable reference to said const objects, unless 
you tread into UB territory by forcefully casting it away.  
This makes const so limited in applicability that, for the most 
part, I've given up using const at all, in spite of having 
tried quite hard to use it as much as possible for years.


Simen's opHeadMutable [0] was pretty good solution to this const 
range stuff, but for some reason (not specified by anyone in the 
thread) it didn't seem to catch on :/


[0] 
https://forum.dlang.org/post/zsaqtmvqmfkzhrmrm...@forum.dlang.org





Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread H. S. Teoh via Digitalmars-d
On Tue, Aug 28, 2018 at 10:20:06AM -0700, Manu via Digitalmars-d wrote:
[...]
> The reality is though, that D's const is not actually very useful, and
> C++'s const is.

Actually, I think C++ const is not very useful, because it guarantees
nothing. At the most, it's just a sanity checker to make sure the
programmer didn't accidentally do something dumb. But given an opaque
C++ function that takes const parameters, there is ZERO guarantee that
it doesn't actually modify stuff behind your back, and do so legally
(per spec).  I mean, how many times have you written const_cast<...>
just to get a piece of code to compile?  I know I've been guilty of this
in many places, because it simply isn't worth the effort to track down
all the places of the code that you need to fix to make it
const-correct.  So basically, C++ const is nothing more than an
annotation that isn't really enforced.

But you're spot on about D's const, though.  While D's const *does*
provide real guarantees (unless you tread into UB territory by casting
it away), that also limits its scope so much that it's rarely useful
outside of rather narrow confines.  Yet because it's so strict, using it
requires investing significant effort.  So you end up with the
unfortunate situation of "a lot of effort" + "limited usefulness" which
for many people equals "not worth using".


> D has no way to express head-const, and it turns out it's a
> tremendously useful concept.

I can live without head-const... but what *really* makes const painful
for me is the lack of head-mutable. I.e., given a const container (which
implies const objects), there is no universal way to obtain a mutable
reference to said const objects, unless you tread into UB territory by
forcefully casting it away.  This makes const so limited in
applicability that, for the most part, I've given up using const at all,
in spite of having tried quite hard to use it as much as possible for
years.


[...]
> I've also had occasional success refactoring to support const, but
> it's certainly the case that success is not guaranteed. And it's
> always time consuming regardless.

Yes, it's time-consuming.  And takes significant effort.  In spite of
being rather limited in applicability.  In my experience, it's useful
for isolated pieces of code near the bottom of the program's call chain,
where there is little or no additional dependencies.  But it's just too
cumbersome to use at any higher level, and a royal pain in generic code
(which I'm quite heavy on).  It probably *can* be made to work in most
cases, but it falls under my umbrella category of "too much effort
needed, only marginal benefits, therefore not worth it".


T

-- 
A linguistics professor was lecturing to his class one day. "In
English," he said, "A double negative forms a positive. In some
languages, though, such as Russian, a double negative is still a
negative. However, there is no language wherein a double positive can
form a negative." A voice from the back of the room piped up, "Yeah,
yeah."


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Manu via Digitalmars-d
On Tue, 28 Aug 2018 at 00:55, Walter Bright via Digitalmars-d
 wrote:
>
> On 8/26/2018 11:16 PM, Manu wrote:
> >> The code looks the same, and in fact, is about 98% the same.
> > This code appears to be a mechanical translation.
>
> It's not. It's by hand. But I had a specific goal of minimizing the diffs, so
> that if the translation didn't work, it reduced the number of places to look 
> for
> the mistake. And in fact, this has saved me a LOT of grief :-)
>
>
> > That's not what
> > happened in this case; he wrote his game in D from scratch.
> > It was just that he arrived at mostly the same place. He was googling
> > for styling and sample material, but I suspect the problem was a lack
> > of topical material demonstrating how he might write his D code
> > differently.
>
> It takes time to learn how to write idiomatic D effectively. I'm still 
> learning
> how to do it right, too.
>
>
> > It's also the case that the significant difference between C++ and D
> > (in my experience) mostly come down to: D has modules, tidier meta,
> > UDA's, slices, and ranges/UFCS. In trade, D struggles with const, and
> > ref is broken.
> > If your code doesn't manifest some gravity towards one of those
> > features, it will tend to be quite same-ey, and advantage may not be
> > particularly apparent.
>
> I suspect that is still a bit stuck on looking at individual instruments and 
> not
> seeing the orchestra.
>
> Let's take the much-maligned D const. It isn't C++ const (let's call that
> "head-const", because that's what it is). Head-const for a function parameter
> tells us very little about what may happen to it in the function. You can 
> pass a
> head-const reference to a container, and have the function add/change/delete
> every element of that container, all without a peep from any C++ tool. Looking
> at the function signature, you've really got no clue whatsoever.
>
> The reason people have trouble with transitive-const is that they are still
> programming in C++, where they *do* add/change/delete every member of the
> "const" container.

We understand... really. I've spent a decade digesting this, and I'm
not one of those that has ever really complained about D's const. I've
always mostly bought into it philosophically.

The reality is though, that D's const is not actually very useful, and
C++'s const is. D has no way to express head-const, and it turns out
it's a tremendously useful concept.
As I said, I tend to create a head-const hack to use in its place, and
that gets me out of jail... but it's specified as undefined behaviour,
which isn't great.

> That includes me. I try to add transitive-const, and it won't compile, 
> because I
> as well am used to replacing the engine and tail lights in my head-const car. 
> In
> order to use transitive-const, it's forcing me to fundamentally re-think how I
> organize code into functions.

I often walk the same path, but sometimes it doesn't yield success,
and in many cases, it just doesn't actually model the problem.
If the problem is that I want a const container class; I don't want a
function I pass a vector to mutate the container structure, that is,
add/remove/reorder/reallocate the array, but I DO intend it to
interact with mutable elements.
That's a perfectly valid problem structure, and it turns out, it's
very common. I would attribute the given element type as const if I
wanted the const-ness to propagate to the elements, that's obvious and
convenient.

It might be that we can sufficiently rearrange all manner of
conventional wisdom to interact successfully with D's const, but I've
been watching this space for 10 years, and nobody has produced any
such evidence, or articles that we can read and understand how to
wrangle successful solutions.
This particular class of problem reeks of the typical criticism for
Rust... that is, I have better things to be doing with my time than
trying to find awkward alternative code structures to pacify the
const-checker, when in reality, const-container-of-mutable-elements is
simply the correct conceptual modeling of the problem.

Anyway, I'm not fighting that battle. I have enough of my own.

> For example, dmd is full of functions that combine data-gathering with
> taking-action. I've been reorganizing to separate data-gathering and
> taking-action into separate functions. The former can be transitive-const, 
> maybe
> even pure. And I like the results, the code becomes much easier to understand.

I've also had occasional success refactoring to support const, but
it's certainly the case that success is not guaranteed. And it's
always time consuming regardless.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread H. S. Teoh via Digitalmars-d
On Tue, Aug 28, 2018 at 08:18:57AM +, Eugene Wissner via Digitalmars-d 
wrote:
[...]
> There are still valid use cases where const should be "broken". One of
> them is mutex (another one caching). I have very little experiance in
> multi-threaded programming, but what do you think about "mutable"
> members, despite the object is const?

The problem with compromising const is that it would invalidate any
guarantees const may have provided.  Const in D is not the same as const
in languages like C++; const in D means *physical* const, as in, the
data might reside in ROM where it's physically impossible to modify.
Allowing the user to bypass this means UB if the data exists in ROM.

Plus, the whole point of const in D is that it is machine-verifiable,
i.e., the compiler checks that the code does not break const in any way
and therefore you are guaranteed (barring compiler bugs) that the data
does not change.  If const were not machine-verifiable, it would be
nothing more than programming by convention, since it would guarantee
nothing.  Allowing const to be "broken" somewhere would mean it's no
longer machine-verifiable (you need a human to verify whether the
semantics are still correct).

Many of D's const woes can actually be solved if we had a
language-supported way of declaring the equivalence between const(U!T)
and U!(const(T)), AKA head-mutable.  The language already supports a
(very) limited set of such conversions, e.g., const(T*) is assignable to
const(T)*, because you're just making a copy of the pointer, but the
target is still unchangeable.  However, because there is no way to
specify such a conversion in a user-defined type, that means things like
RefCounted, or caches, or mutexes, cannot be made to work without either
ugly workarounds or treading into UB territory by casting away const.

But if there is a way for a user-defined template U to define a
conversion from const(U!T) to U!(const(T)) (the conversion code, of
course, would have to be const-correct and verifiable by the compiler),
then we could make it so that U!(const(T)) contained a mutable portion
(e.g., the refcount, mutex, cache, etc.) and an immutable portion (the
reference to the const object).


T

-- 
In order to understand recursion you must first understand recursion.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread H. S. Teoh via Digitalmars-d
On Mon, Aug 27, 2018 at 06:11:14PM -0700, Walter Bright via Digitalmars-d wrote:
> On 8/27/2018 10:08 AM, H. S. Teoh wrote:
> > Const in D makes sense as-is.  Though, granted, its infectiousness means
> > its scope is actually very narrow, and as a result, we ironically
> > can't use it in very many places, and so its touted benefits only
> > rarely apply. :-(  Which also means that it's taking up a lot of
> > language design real estate with not many benefits to show for it.
> 
> D const is of great utility if you're interested in functional
> programming.  Using it has forced me to rethink how I separate tasks
> into functions, and the result is for the better.
> 
> I agree that D const has little utility if you try to program in C++
> style.

I am very interested in functional programming, yet ironically, one of
D's top functional programming selling points, ranged-based programming,
interacts badly with const.  Just ask Jonathan about using ranges with
const, and you'll see what I mean. :-)

The very design of ranges in D requires that the range be mutable.
However, because const is infectious, this makes it a royal pain to use
in practice.  Take, for example, a user-defined container type, let's
call it L. For argument's sake, let's say it's a linked list.  And let's
say the list elements are reference-counted -- we'll write that as
RefCounted!Elem even though this argument isn't specific to the current
Phobos implementation of RefCounted.

As experience has shown in the past, it's usually a good idea to
separate the container from the range that iterates over it, so an
obvious API choice would be to define, say, an .opSlice method for L
that returns a range over its elements.

Now, logically speaking, iterating over L shouldn't modify it, so it
would make sense that .opSlice should be const. So we have:

struct L {
private RefCounted!Elem head, tail;
auto opSlice() const {
...
}
}

The returned range, however, must be mutable, since otherwise you
couldn't use .popFront to iterate over it (and correspondingly, Phobos
isInputRange would evaluate to false).  But here's the problem: because
opSlice is declared const, that means `this` is also const, which means
this.head and this.tail are also const.  But since this.head is const,
that means you couldn't do this:

auto opSlice() const {
struct Result {
RefCounted!(const(Elem)) current;
... // rest of range API
void popFront() {
current = current.next;
// Error: cannot assign const(RefCounted!Elem) 
to RefCounted!(const(Elem))
}
}
return Result(head); // <-- Error: cannot assign 
const(RefCounted!Elem) to RefCounted!(const(Elem))
}

This would have worked had we used pointers instead, because the
compiler knows that it's OK to assign const(Elem*) to const(Elem)*.
However, in this case, the compiler has no way of knowing that it is
safe to assign const(RefCounted!Elem) to RefCounted!(const(Elem)).
Indeed, they are different types, and the language currently has no way
of declaring the head-mutable construct required here.

This is only the tip of the iceberg, of course.  If you then try to add
a method to RefCounted to make it convert const(RefCounted!T) to
RefCounted!(const(T)), then you'll be led down a rabbit hole of further
problems with const (e.g., how to implement ref-counting with const
objects in a way that doesn't violate the type system) until you reach
the point where it's impossible to proceed without casting away const
somehow.  Unfortunately, the spec says that's Undefined Behaviour.  So
you're on your own.

This is just one example among many, that const is hard to use in the
general case.  It works fairly well for a narrow number of cases, such
as for built-in types, but once you start generalizing your code, you'll
find brick walls in undesired places, the workarounds for which require
so much effort as to offset any benefits that const may have brought.

TL;DR: const is beautiful in theory, but hard to use in practice.  So
hard that it's often not worth the trouble, despite the benefits that it
undoubtedly does provide.

P.S. If D had the concept of head-mutable, a lot of this pain (though
not all) would have been alleviated.


T

-- 
"I'm running Windows '98." "Yes." "My computer isn't working now." "Yes, you 
already said that." -- User-Friendly


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Mike Parker via Digitalmars-d

On Tuesday, 28 August 2018 at 08:44:26 UTC, Chris wrote:



Last but not least, if it's true that the D Foundation has 
raised only 3.2K, then there's something seriously wrong.


The Foundation has significantly more than 3.2k. The Open 
Collective account is relatively new and is but one option. 
People also donate via PayPal and other means [1], with several 
monthly contributors. The Foundation is paying two full-time 
developers, pays me for part-time work, pays out bounties for 
guest posts on the D Blog, pays out bounties for specific coding 
tasks, contributes to the funding of DConf, and more.


Soon there will be more fundraising drives for targeted 
initiatives, like the test drive we did with the VS Code plugin, 
to make up for the lack of donations of time. Some of them will 
be for paying people to fix onerous issues in Bugzilla. It's a 
long-term project for me.


[1] https://dlang.org/foundation/donate.html


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Chris via Digitalmars-d

On Tuesday, 28 August 2018 at 08:44:26 UTC, Chris wrote:


When people choose a programming language, there are several 
boxes that have to be ticked, like for example:


- what's the future of language X? (guarantees, stability)
- how easy is it to get going (from "Hello world" to a complete 
tool chain)

- will it run on ARM?
- will it be a good choice for the Web (e.g. webasm)?
- how good is it at data processing / number grinding
- etc.



I don't know if all their claims are 100% true, but let that sink 
in for a while:


https://julialang.org/.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Chris via Digitalmars-d

On Tuesday, 28 August 2018 at 07:30:01 UTC, Walter Bright wrote:

On 8/27/2018 2:14 AM, Chris wrote:



bad feeling about the way things are going atm.


I can quote you a lng list of problems that are obvious 
only in hindsight, by world leading development teams.


Start by watching the documentary series "Aviation Disasters", 
look at Challenger, Deepwater Horizon, Fukushima, Apollo 1, 
Apollo 13, the World Trade Centers, etc. Of course, there are a 
number of them in C, C++, Java, Javascript, basically every 
language I've worked with.


I'll guarantee every non-trivial project you've worked on has 
problems that are obvious only in hindsight, too. If you wait 
till it's perfect, you'll never ship, and yet it'll *still* 
have problems.


I'm not making excuses for mistakes - just don't have 
unworkable requirements.


This is all good and well and I know that anyone who develops 
software shoots him/herself in the foot sooner or later. But this 
is not the same situation. If you have to ship something till 
date X, then you are under pressure and naturally make mistakes 
that are obvious only on hindsight. But D is not under pressure 
to  include new features so frequently. There's absolutely no 
reason to rush into something that eats up a lot of your time 
(better spent on more urgent problems) and by so doing produce 
possible breakages.


The end of the day is, does D get the job done for you better 
than other languages? That's a decision only you can make.


It has done a better job until recently. The problem are not 
things like @safe, `const` and whatnot, the problem are very 
practical issues such as fear of breakage / time spent fixing 
things and running the code on ARM, integration into other 
technologies (webasm).


Since the D Foundation was founded I really thought that part of 
the effort would go into stabilizing the language and developing 
better tools for various aspects of programming (not just 
language features). Programming is so much more than just 
language features, and languages that offer the "so much more" 
part are usually the ones people adopt. But somehow D still seems 
to be in its hobby hacker days. Features are first and foremost, 
everything else comes second. But features get "ripped" by other 
programming languages and they can pick and choose, because they 
know what really worked in D, while D has to struggle with the 
things that didn't work or only half worked.


Laeeth was talking about being analytical about the whole thing. 
Why not find out what features are really being used? I.e. does 
the majority really need - for practical purposes - partially 
constructed objects?


When people choose a programming language, there are several 
boxes that have to be ticked, like for example:


- what's the future of language X? (guarantees, stability)
- how easy is it to get going (from "Hello world" to a complete 
tool chain)

- will it run on ARM?
- will it be a good choice for the Web (e.g. webasm)?
- how good is it at data processing / number grinding
- etc.

I think the D Foundation should focus on the more "trivial" 
things too. If a company is asked to develop a data grinding web 
application along with a smart phone app - will it choose D? If a 
company offers localization services and translations - will it 
choose D (autodecode)?


The D community / leadership is acting as if they had all the 
time in the world. But other languages are moving fast and they 
learn from D what _not_ to do.


Last but not least, if it's true that the D Foundation has raised 
only 3.2K, then there's something seriously wrong.




Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Eugene Wissner via Digitalmars-d

On Tuesday, 28 August 2018 at 07:53:34 UTC, Walter Bright wrote:
Let's take the much-maligned D const. It isn't C++ const (let's 
call that "head-const", because that's what it is). Head-const 
for a function parameter tells us very little about what may 
happen to it in the function. You can pass a head-const 
reference to a container, and have the function 
add/change/delete every element of that container, all without 
a peep from any C++ tool. Looking at the function signature, 
you've really got no clue whatsoever.


The reason people have trouble with transitive-const is that 
they are still programming in C++, where they *do* 
add/change/delete every member of the "const" container.


That includes me. I try to add transitive-const, and it won't 
compile, because I as well am used to replacing the engine and 
tail lights in my head-const car. In order to use 
transitive-const, it's forcing me to fundamentally re-think how 
I organize code into functions.


For example, dmd is full of functions that combine 
data-gathering with taking-action. I've been reorganizing to 
separate data-gathering and taking-action into separate 
functions. The former can be transitive-const, maybe even pure. 
And I like the results, the code becomes much easier to 
understand.


There are still valid use cases where const should be "broken". 
One of them is mutex (another one caching). I have very little 
experiance in multi-threaded programming, but what do you think 
about "mutable" members, despite the object is const?


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Walter Bright via Digitalmars-d

On 8/26/2018 11:16 PM, Manu wrote:

The code looks the same, and in fact, is about 98% the same.

This code appears to be a mechanical translation.


It's not. It's by hand. But I had a specific goal of minimizing the diffs, so 
that if the translation didn't work, it reduced the number of places to look for 
the mistake. And in fact, this has saved me a LOT of grief :-)




That's not what
happened in this case; he wrote his game in D from scratch.
It was just that he arrived at mostly the same place. He was googling
for styling and sample material, but I suspect the problem was a lack
of topical material demonstrating how he might write his D code
differently.


It takes time to learn how to write idiomatic D effectively. I'm still learning 
how to do it right, too.




It's also the case that the significant difference between C++ and D
(in my experience) mostly come down to: D has modules, tidier meta,
UDA's, slices, and ranges/UFCS. In trade, D struggles with const, and
ref is broken.
If your code doesn't manifest some gravity towards one of those
features, it will tend to be quite same-ey, and advantage may not be
particularly apparent.


I suspect that is still a bit stuck on looking at individual instruments and not 
seeing the orchestra.


Let's take the much-maligned D const. It isn't C++ const (let's call that 
"head-const", because that's what it is). Head-const for a function parameter 
tells us very little about what may happen to it in the function. You can pass a 
head-const reference to a container, and have the function add/change/delete 
every element of that container, all without a peep from any C++ tool. Looking 
at the function signature, you've really got no clue whatsoever.


The reason people have trouble with transitive-const is that they are still 
programming in C++, where they *do* add/change/delete every member of the 
"const" container.


That includes me. I try to add transitive-const, and it won't compile, because I 
as well am used to replacing the engine and tail lights in my head-const car. In 
order to use transitive-const, it's forcing me to fundamentally re-think how I 
organize code into functions.


For example, dmd is full of functions that combine data-gathering with 
taking-action. I've been reorganizing to separate data-gathering and 
taking-action into separate functions. The former can be transitive-const, maybe 
even pure. And I like the results, the code becomes much easier to understand.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-28 Thread Walter Bright via Digitalmars-d

On 8/27/2018 2:14 AM, Chris wrote:

On Sunday, 26 August 2018 at 22:44:05 UTC, Walter Bright wrote:
Because nobody thought about that issue before. A lot of things only become 
apparent in hindsight.
QED. With this approach you do more harm than good. I have a bad feeling about 
the way things are going atm.


I can quote you a lng list of problems that are obvious only in hindsight, 
by world leading development teams.


Start by watching the documentary series "Aviation Disasters", look at 
Challenger, Deepwater Horizon, Fukushima, Apollo 1, Apollo 13, the World Trade 
Centers, etc. Of course, there are a number of them in C, C++, Java, Javascript, 
basically every language I've worked with.


I'll guarantee every non-trivial project you've worked on has problems that are 
obvious only in hindsight, too. If you wait till it's perfect, you'll never 
ship, and yet it'll *still* have problems.


I'm not making excuses for mistakes - just don't have unworkable requirements.

The end of the day is, does D get the job done for you better than other 
languages? That's a decision only you can make.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-27 Thread tide via Digitalmars-d

On Tuesday, 28 August 2018 at 01:11:14 UTC, Walter Bright wrote:

On 8/27/2018 10:08 AM, H. S. Teoh wrote:
Const in D makes sense as-is.  Though, granted, its 
infectiousness means
its scope is actually very narrow, and as a result, we 
ironically can't
use it in very many places, and so its touted benefits only 
rarely
apply. :-(  Which also means that it's taking up a lot of 
language

design real estate with not many benefits to show for it.


D const is of great utility if you're interested in functional 
programming. Using it has forced me to rethink how I separate 
tasks into functions, and the result is for the better.


I agree that D const has little utility if you try to program 
in C++ style.


It doesn't play well with templates or any of the like either, so 
even if you try to do template programming it is just better to 
not use it.


I'm curious as to what an example of this D const for functional 
programming would look like.


Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

2018-08-27 Thread Walter Bright via Digitalmars-d

On 8/27/2018 10:08 AM, H. S. Teoh wrote:

Const in D makes sense as-is.  Though, granted, its infectiousness means
its scope is actually very narrow, and as a result, we ironically can't
use it in very many places, and so its touted benefits only rarely
apply. :-(  Which also means that it's taking up a lot of language
design real estate with not many benefits to show for it.


D const is of great utility if you're interested in functional programming. 
Using it has forced me to rethink how I separate tasks into functions, and the 
result is for the better.


I agree that D const has little utility if you try to program in C++ style.


  1   2   3   >