Re: Why not flag away the mistakes of the past?

2018-03-09 Thread Chris via Digitalmars-d

On Friday, 9 March 2018 at 06:14:05 UTC, Jonathan M Davis wrote:



We'll make breaking changes if we judge the gain to be worth 
the pain, but we don't want to be constantly breaking people's 
code, and some changes are large enough that there's arguably 
no justification for them, because they would simply be too 
disruptive. Because of how common string processing is and how 
integrated auto-decoding is into D's string processing, it is 
very difficult to come up with a way to change it which isn't 
simply too disruptive to be justified, even though we want to 
change it. So, this is a particularly difficult case, and how 
we're going to end up handling it remains to be seen. Thus far, 
we've mainly worked on providing better ways to get around it, 
because we can do that without breaking code, whereas actually 
removing it is extremely difficult.


- Jonathan M Davis


It's aleady been said (by myself and others) that we should 
actually try to remove it (with a compiler switch) and then see 
what happens, how much code actually breaks, and based on that 
experience we can come up with a strategy. I've already said that 
I'm willing to try it on my code (that is almost 100% string 
processing). Why not _try_ it, later we can still philosophize


Re: Why not flag away the mistakes of the past?

2018-03-09 Thread Gary Willoughby via Digitalmars-d

On Wednesday, 7 March 2018 at 17:11:55 UTC, H. S. Teoh wrote:

Kill autodecoding, I say. Kill it with fire!!


T


Please!!!



Re: Why not flag away the mistakes of the past?

2018-03-09 Thread Guillaume Piolat via Digitalmars-d

On Thursday, 8 March 2018 at 17:35:11 UTC, H. S. Teoh wrote:
Yeah, the only reason autodecoding survived in the beginning 
was because Andrei (wrongly) thought that a Unicode code point 
was equivalent to a grapheme.  If that had been the case, the 
cost associated with auto-decoding may have been justifiable.  
Unfortunately, that is not the case, which greatly diminishes 
most of the advantages that autodecoding was meant to have.  So 
it ended up being something that incurred a significant 
performance hit, yet did not offer the advantages it was 
supposed to.  To fully live up to Andrei's original vision, it 
would have to include grapheme segmentation as well.  
Unfortunately, graphemes are of arbitrary length and cannot in 
general fit in a single dchar (or any fixed-size type), and 
grapheme segmentation is extremely costly to compute, so doing 
it by default would kill D's string manipulation performance.



I remember something a bit different last time it was discussed:

 - removing auto-decoding was breaking a lot of code, it's used 
in lots of place

 - performance loss could be mitigated with .byCodeUnit everytime
 - Andrei correctly advocating against breakage

Personally I do use auto-decoding, often iterating by codepoint, 
and uses it for fonts and parsers. It's correct for a large 
subset of languages. You gave us a feature and now we are using 
it ;)


Re: Why not flag away the mistakes of the past?

2018-03-08 Thread Jonathan M Davis via Digitalmars-d
On Friday, March 09, 2018 03:16:03 Taylor Hillegeist via Digitalmars-d 
wrote:

> I wasn't so much asking about auto-decoding in particular more
> about the mentality and methods of breaking changes.
>
> In a way any change to the compiler is a breaking change when it
> comes to the configuration.
>
> I for one never expect code to compile on the latest compiler, It
> has to be the same compiler same version for the code base to
> work as expected.
>
> At one point I envisioned every file with a header that states
> the version of the compiler required for that module. A
> sophisticated configuration tool could take and compile each
> module with its respective version and then one could link. (this
> could very well be the worst idea ever)
>
> I'm not saying we should be quick to change... oh noo that would
> be very bad. But after you set in the filth of your decisions
> long and hard and are certian that it is indeed bad there should
> be a plan for action and change. And when it comes to change it
> should be an evolution not a revolution.
>
> It is good avoiding the so easily accepted mentality of legacy...
> Why do you do it that way? "It's because we've always done it
> that way."
>
> The reason I like D is often that driven by its community it
> innovates and renovates into a language that is honestly really
> fun to use. (most of the time)

Any and all changes need to be weighed for their pros and cons. No one likes
it when their code breaks, and ideally, programs would work pretty much
forever without modification, but there are changes that are worth dealing
with code breakage. Part of the problem is deciding which changes are worth
it, and some of that depends on what the migration path would be. Some stuff
can be changed with minimal pain, and other stuff can't really be changed
without breaking everything. And the more D code that exists, the higher the
cost for any change. The drive to make D perfect and the need to be able to
use and rely on D code working in production without having to keep changing
it are always in conflict.

As Walter likes to say, some folks don't want you to break anything, whereas
some folks want breaking changes, and they're frequently the same people.

Ideally, any D code that you write would work permanently as-is. Also
ideally, any and all problems or pain points with D and its standard library
would be fixed. Those two things are in complete contradiction of one
another, and it's not always easy to judge how to deal with that. Sometimes,
it means that we're stuck with legacy decisions, because fixing them is too
costly, whereas other times, it means that we deprecate something, and some
of the D code out there has to be updated, or it won't compile anymore in a
release somewhere in the future. Either way, outright breaking code
immediately, with no migration process is pretty much always unacceptable.

We'll make breaking changes if we judge the gain to be worth the pain, but
we don't want to be constantly breaking people's code, and some changes are
large enough that there's arguably no justification for them, because they
would simply be too disruptive. Because of how common string processing is
and how integrated auto-decoding is into D's string processing, it is very
difficult to come up with a way to change it which isn't simply too
disruptive to be justified, even though we want to change it. So, this is a
particularly difficult case, and how we're going to end up handling it
remains to be seen. Thus far, we've mainly worked on providing better ways
to get around it, because we can do that without breaking code, whereas
actually removing it is extremely difficult.

- Jonathan M Davis



Re: Why not flag away the mistakes of the past?

2018-03-08 Thread Taylor Hillegeist via Digitalmars-d

On Thursday, 8 March 2018 at 17:14:16 UTC, Jonathan M Davis wrote:
On Thursday, March 08, 2018 16:34:11 Guillaume Piolat via 
Digitalmars-d wrote:

On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis

wrote:
> On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via
>
> Digitalmars-d wrote:
>> On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor 
>> Hillegeist

>>
>> wrote:
>> > That way the breaking change was easily fixable, and the 
>> > mistakes of the past not forever. Is it just the cost of 
>> > maintenance?

>>
>> auto-decoding problem was mostly that it couldn't be @nogc 
>> since throwing, but with further releases exception 
>> throwing will get @nogc. So it's getting fixed.

>
> I'd actually argue that that's the lesser of the problems 
> with auto-decoding. The big problem is that it's 
> auto-decoding. Code points are almost always the wrong level 
> to be operating at. The programmer needs to be in control of 
> whether the code is operating on code units, code points, or 
> graphemes, and because of auto-decoding, we have to 
> constantly avoid using the range primitives for arrays on 
> strings. Tons of range-based code has to special case for 
> strings in order to work around auto-decoding. We're 
> constantly fighting our own API in order to process strings 
> sanely and efficiently.


I'd agree with you, hate the special casing. However it seems 
to me this has been debated to death already, and that 
auto-decoding was successfully advocated by Alexandrescu and 
al; surviving the controversy years ago.


Most everyone who debated in favor of it early on is very much 
against it now (and I'm one of them). Experience and a better


I wasn't so much asking about auto-decoding in particular more 
about the mentality and methods of breaking changes.


In a way any change to the compiler is a breaking change when it 
comes to the configuration.


I for one never expect code to compile on the latest compiler, It 
has to be the same compiler same version for the code base to 
work as expected.


At one point I envisioned every file with a header that states 
the version of the compiler required for that module. A 
sophisticated configuration tool could take and compile each 
module with its respective version and then one could link. (this 
could very well be the worst idea ever)


I'm not saying we should be quick to change... oh noo that would 
be very bad. But after you set in the filth of your decisions 
long and hard and are certian that it is indeed bad there should 
be a plan for action and change. And when it comes to change it 
should be an evolution not a revolution.


It is good avoiding the so easily accepted mentality of legacy... 
Why do you do it that way? "It's because we've always done it 
that way."


The reason I like D is often that driven by its community it 
innovates and renovates into a language that is honestly really 
fun to use. (most of the time)


Re: Why not flag away the mistakes of the past?

2018-03-08 Thread Henrik via Digitalmars-d

On Thursday, 8 March 2018 at 17:35:11 UTC, H. S. Teoh wrote:
On Thu, Mar 08, 2018 at 10:14:16AM -0700, Jonathan M Davis via 
Digitalmars-d wrote:

[...]

[...]

[...]

[...]

Yeah, the only reason autodecoding survived in the beginning 
was because Andrei (wrongly) thought that a Unicode code point 
was equivalent to a grapheme.  If that had been the case, the 
cost associated with auto-decoding may have been justifiable.  
Unfortunately, that is not the case, which greatly diminishes 
most of the advantages that autodecoding was meant to have.  So 
it ended up being something that incurred a significant 
performance hit, yet did not offer the advantages it was 
supposed to.  To fully live up to Andrei's original vision, it 
would have to include grapheme segmentation as well.  
Unfortunately, graphemes are of arbitrary length and cannot in 
general fit in a single dchar (or any fixed-size type), and 
grapheme segmentation is extremely costly to compute, so doing 
it by default would kill D's string manipulation performance.


[...]


Which companies are against changing this? They must be powerful 
indeed if their convenience is important enough to protect so 
destructive features. Even C++ managed to give up trigraphs 
against the will of IBM. Surely D can give up something that is 
even more destructive?




Re: Why not flag away the mistakes of the past?

2018-03-08 Thread H. S. Teoh via Digitalmars-d
On Thu, Mar 08, 2018 at 10:14:16AM -0700, Jonathan M Davis via Digitalmars-d 
wrote:
> On Thursday, March 08, 2018 16:34:11 Guillaume Piolat via Digitalmars-d 
> wrote:
[...]
> > I'd agree with you, hate the special casing. However it seems to
> > me this has been debated to death already, and that auto-decoding
> > was successfully advocated by Alexandrescu and al; surviving the
> > controversy years ago.
> 
> Most everyone who debated in favor of it early on is very much against
> it now (and I'm one of them). Experience and a better understanding of
> Unicode has shown it to be a terrible idea. I question that you will
> find any significant contributor to Phobos who would choose to have it
> if we were starting from scratch, and most of the folks who post in
> the newsgroup agree with that.
[...]

Yeah, the only reason autodecoding survived in the beginning was because
Andrei (wrongly) thought that a Unicode code point was equivalent to a
grapheme.  If that had been the case, the cost associated with
auto-decoding may have been justifiable.  Unfortunately, that is not the
case, which greatly diminishes most of the advantages that autodecoding
was meant to have.  So it ended up being something that incurred a
significant performance hit, yet did not offer the advantages it was
supposed to.  To fully live up to Andrei's original vision, it would
have to include grapheme segmentation as well.  Unfortunately, graphemes
are of arbitrary length and cannot in general fit in a single dchar (or
any fixed-size type), and grapheme segmentation is extremely costly to
compute, so doing it by default would kill D's string manipulation
performance.

In hindsight, it was obviously a failure and a wrong design decision.
Walter is clearly against it after he learned that it comes with a hefty
performance cost, and even Andrei himself would admit today that it was
a mistake.  It's only that he, understandably, does not agree with any
change that would disrupt existing code. And that's what we're faced
with right now.


T

-- 
Frank disagreement binds closer than feigned agreement.


Re: Why not flag away the mistakes of the past?

2018-03-08 Thread Jonathan M Davis via Digitalmars-d
On Thursday, March 08, 2018 16:34:11 Guillaume Piolat via Digitalmars-d 
wrote:
> On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis
>
> wrote:
> > On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via
> >
> > Digitalmars-d wrote:
> >> On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist
> >>
> >> wrote:
> >> > That way the breaking change was easily fixable, and the
> >> > mistakes of the past not forever. Is it just the cost of
> >> > maintenance?
> >>
> >> auto-decoding problem was mostly that it couldn't be @nogc
> >> since throwing, but with further releases exception throwing
> >> will get @nogc. So it's getting fixed.
> >
> > I'd actually argue that that's the lesser of the problems with
> > auto-decoding. The big problem is that it's auto-decoding. Code
> > points are almost always the wrong level to be operating at.
> > The programmer needs to be in control of whether the code is
> > operating on code units, code points, or graphemes, and because
> > of auto-decoding, we have to constantly avoid using the range
> > primitives for arrays on strings. Tons of range-based code has
> > to special case for strings in order to work around
> > auto-decoding. We're constantly fighting our own API in order
> > to process strings sanely and efficiently.
>
> I'd agree with you, hate the special casing. However it seems to
> me this has been debated to death already, and that auto-decoding
> was successfully advocated by Alexandrescu and al; surviving the
> controversy years ago.

Most everyone who debated in favor of it early on is very much against it
now (and I'm one of them). Experience and a better understanding of Unicode
has shown it to be a terrible idea. I question that you will find any
significant contributor to Phobos who would choose to have it if we were
starting from scratch, and most of the folks who post in the newsgroup agree
with that. The problem is what to do given that we don't want it and that no
one has come up with a way to remove it without breaking tons of code in the
process or even providing a clean migration path. So, given how difficult it
is to remove at this point, you'll find disagreement about how that should
be handled ranging from deciding that we're just stuck with it to wanting to
remove it regardless of the cost. But there seems to be almost universal
agreement now (certainly among the folks who might make such a decision)
that auto-decoding was a mistake. So, there's agreement that it would
ideally go, but there isn't agreement on what we should actually do given
the situation that we're in.

- Jonathan M Davis



Re: Why not flag away the mistakes of the past?

2018-03-08 Thread Guillaume Piolat via Digitalmars-d
On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis 
wrote:
On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via 
Digitalmars-d wrote:

On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist

wrote:
> That way the breaking change was easily fixable, and the 
> mistakes of the past not forever. Is it just the cost of 
> maintenance?


auto-decoding problem was mostly that it couldn't be @nogc 
since throwing, but with further releases exception throwing 
will get @nogc. So it's getting fixed.


I'd actually argue that that's the lesser of the problems with 
auto-decoding. The big problem is that it's auto-decoding. Code 
points are almost always the wrong level to be operating at. 
The programmer needs to be in control of whether the code is 
operating on code units, code points, or graphemes, and because 
of auto-decoding, we have to constantly avoid using the range 
primitives for arrays on strings. Tons of range-based code has 
to special case for strings in order to work around 
auto-decoding. We're constantly fighting our own API in order 
to process strings sanely and efficiently.


I'd agree with you, hate the special casing. However it seems to 
me this has been debated to death already, and that auto-decoding 
was successfully advocated by Alexandrescu and al; surviving the 
controversy years ago.


Re: Why not flag away the mistakes of the past?

2018-03-08 Thread Dukc via Digitalmars-d

On Wednesday, 7 March 2018 at 16:29:33 UTC, Seb wrote:

Well, I tried that already:

https://github.com/dlang/phobos/pull/5513

In short: very easy to do, but not much interest at the time.


No. The main problem with that (and the idea of using a compiler 
flag in general) is that it affects the whole compilation. That 
means that every single third-party library, not only Phobos, has 
to work BOTH with and without the switch.


IMO, if we find a way to enable or disable autodecoding per 
module, not per compilation, that will make deprectating it more 
than worthwhile.


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread Jon Degenhardt via Digitalmars-d

On Wednesday, 7 March 2018 at 16:33:25 UTC, Seb wrote:
On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt 
wrote:
On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
wrote:

[...]


Auto-decoding is a significant issue for the applications I 
work on (search engines). There is a lot of string 
manipulation in these environments, and performance matters. 
Auto-decoding is a meaningful performance hit. Otherwise, 
Phobos has a very nice collection of algorithms for string 
manipulation. It would be great to have a way to turn 
auto-decoding off in Phobos.


Well you can use byCodeUnit, which disables auto-decoding

Though it's not well-known and rather annoying to explicitly 
add it almost everywhere.


I looked at this once. It didn't appear to be a viable solution, 
though I forget the details. I can probably resurrect them if 
that would be helpful.


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread H. S. Teoh via Digitalmars-d
On Wed, Mar 07, 2018 at 04:29:33PM +, Seb via Digitalmars-d wrote:
> On Wednesday, 7 March 2018 at 14:59:35 UTC, Steven Schveighoffer wrote:
> > On 3/7/18 1:00 AM, Taylor Hillegeist wrote:
> > > [...]
> > 
> > Note, autodecoding is NOT a feature of the language, but rather a
> > feature of Phobos.
> > 
> > It would be quite interesting I think to create a modified phobos
> > where autodecoding was optional and see what happens (could be
> > switched with a -version=autodecoding). It wouldn't take much effort
> > -- just take out the specializations for strings in std.array.
> > 
> > -Steve
> 
> Well, I tried that already:
> 
> https://github.com/dlang/phobos/pull/5513
> 
> In short: very easy to do, but not much interest at the time.

Argh... this really struck a nerve.  "Not much interest"?!  I think a
more accurate description is every passerby going "that looks dangerous
and I don't have enough time to spare to look into it right now, so
better just leave it up to somebody else to stick their neck out and get
beheaded by Andrei later", resulting in nobody taking apparent interest
in the PR, even though many of us *really* want to see autodecoding go
the way of the dodo.


T

-- 
EMACS = Extremely Massive And Cumbersome System


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread H. S. Teoh via Digitalmars-d
On Wed, Mar 07, 2018 at 04:33:25PM +, Seb via Digitalmars-d wrote:
> On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote:
[...]
> > Auto-decoding is a significant issue for the applications I work on
> > (search engines). There is a lot of string manipulation in these
> > environments, and performance matters. Auto-decoding is a meaningful
> > performance hit. Otherwise, Phobos has a very nice collection of
> > algorithms for string manipulation. It would be great to have a way
> > to turn auto-decoding off in Phobos.
[...]
> Well you can use byCodeUnit, which disables auto-decoding
> 
> Though it's not well-known and rather annoying to explicitly add it
> almost everywhere.

And therein lies the rub: because it's *auto* decoding, rather than just
decoding, it's implicit everywhere, adding to the performance hit
without the coder being necessarily aware of it. You have to put in the
effort to add .byCodeUnit everywhere.

Worse yet, it gives the false sense of security that you're doing
Unicode "right", when actually that is *not* true at all, because a code
point is not equal to a grapheme (what people normally know as a
"character"). But because operating at the code point level *appears* to
be correct 80% of the time, bugs in string handling often go unnoticed,
unlike operating at the code unit level, where any Unicode handling bugs
are immediately obvious as soon as your string contains non-ASCII
characters.

So you're essentially paying the price of a significant performance hit
for the dubious benefit of non-100%-correct code, but with bugs
conveniently obscured so that it's harder to notice them.

Kill autodecoding, I say. Kill it with fire!!


T

-- 
MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread Seb via Digitalmars-d

On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote:
On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
wrote:

[...]


Auto-decoding is a significant issue for the applications I 
work on (search engines). There is a lot of string manipulation 
in these environments, and performance matters. Auto-decoding 
is a meaningful performance hit. Otherwise, Phobos has a very 
nice collection of algorithms for string manipulation. It would 
be great to have a way to turn auto-decoding off in Phobos.


--Jon


Well you can use byCodeUnit, which disables auto-decoding

Though it's not well-known and rather annoying to explicitly add 
it almost everywhere.




Re: Why not flag away the mistakes of the past?

2018-03-07 Thread Seb via Digitalmars-d
On Wednesday, 7 March 2018 at 14:59:35 UTC, Steven Schveighoffer 
wrote:

On 3/7/18 1:00 AM, Taylor Hillegeist wrote:

[...]


Note, autodecoding is NOT a feature of the language, but rather 
a feature of Phobos.


It would be quite interesting I think to create a modified 
phobos where autodecoding was optional and see what happens 
(could be switched with a -version=autodecoding). It wouldn't 
take much effort -- just take out the specializations for 
strings in std.array.


-Steve


Well, I tried that already:

https://github.com/dlang/phobos/pull/5513

In short: very easy to do, but not much interest at the time.


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread Jon Degenhardt via Digitalmars-d
On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
wrote:
So i've seen on the forum over the years arguments about 
auto-decoding (mostly) and some other things. Things that have 
been considered mistakes, and cannot be corrected because of 
the breaking changes it would create. And I always wonder why 
not make a solution to the tune of a flag that makes things 
work as they used too, and make the new behavior default.


dmd --UseAutoDecoding

That way the breaking change was easily fixable, and the 
mistakes of the past not forever. Is it just the cost of 
maintenance?


Auto-decoding is a significant issue for the applications I work 
on (search engines). There is a lot of string manipulation in 
these environments, and performance matters. Auto-decoding is a 
meaningful performance hit. Otherwise, Phobos has a very nice 
collection of algorithms for string manipulation. It would be 
great to have a way to turn auto-decoding off in Phobos.


--Jon


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread Steven Schveighoffer via Digitalmars-d

On 3/7/18 1:00 AM, Taylor Hillegeist wrote:
So i've seen on the forum over the years arguments about auto-decoding 
(mostly) and some other things. Things that have been considered 
mistakes, and cannot be corrected because of the breaking changes it 
would create. And I always wonder why not make a solution to the tune of 
a flag that makes things work as they used too, and make the new 
behavior default.


dmd --UseAutoDecoding

That way the breaking change was easily fixable, and the mistakes of the 
past not forever. Is it just the cost of maintenance?


Note, autodecoding is NOT a feature of the language, but rather a 
feature of Phobos.


It would be quite interesting I think to create a modified phobos where 
autodecoding was optional and see what happens (could be switched with a 
-version=autodecoding). It wouldn't take much effort -- just take out 
the specializations for strings in std.array.


-Steve


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread Jonathan M Davis via Digitalmars-d
On Wednesday, March 07, 2018 13:40:20 Nick Treleaven via Digitalmars-d 
wrote:
> On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis
>
> wrote:
> > I'd actually argue that that's the lesser of the problems with
> > auto-decoding. The big problem is that it's auto-decoding. Code
> > points are almost always the wrong level to be operating at.
>
> For me the fundamental problem is having char[] in the language
> at all, meaning a Unicode string. Arbitrary slicing and indexing
> are not Unicode compatible, if we revisit this we need a String
> type that doesn't support those operations. Plus the issue of
> string validation - a Unicode string type should be assumed to
> have valid contents - unsafe data should only be checked at
> string construction time, so iterating should always be nothrow.

In principle, char is supposed to be a UTF-8 code unit, and strings are
supposed to be validated up front rather than constantly validated, but it's
never been that way in practice.

Regardless, having char[] be sliceable is actually perfectly fine and
desirable. That's exactly what you want whenever you operate on code units,
and it's frequently the case that you want to be operating at the code unit
level. But the programmer needs to be able to reasonably control when code
units, code points, or graphemes are used, because each has their time and
place. If we had a string type, it would need to provide access to each of
those levels and likely would not be directly sliceable at all, because
slicing a string is kind of meaningless, because in principle, a string is
just on opaque piece of character data. It's when you're dealing at the code
unit, code point, or grapheme level that you actually start operating on
pieces of a string, and that means that the level that you're operating at
needs to be defined.

Having char[] be an array of code units works quite well, because then you
have efficiency by default. You then need to wrap it in another range type
when appropriate to get a range of code points or graphemes, or you need to
explicitly decode when appropriate. Whereas right now, what we have is
Phobos being "helpful" and constantly decoding for us such that we get
needlessy inefficient code, and it's at the code point level, which is
usually not the level you want to operate at. So, you don't have efficiency
or correctness.

Ultimately, it really doesn't work to hide the details of Unicode and not
have the programmer worry about code units, code points, and graphemes
unless you don't care about efficency. As such, what we really need is to
cleanly give the programmer the tools to manage Unicode without the language
or library assuming what the programmer wants - especially assuming an
inefficient default. The language itself actually does a decent job of that.
It's Phobos that dropped the ball on that one, because Andrei didn't know
about graphemes and tried to make Phobos Unicode-correct by default.
Instead, we get inefficient and incorrect by defaltu.

- Jonathan M Davis



Re: Why not flag away the mistakes of the past?

2018-03-07 Thread Nick Treleaven via Digitalmars-d
On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis 
wrote:
I'd actually argue that that's the lesser of the problems with 
auto-decoding. The big problem is that it's auto-decoding. Code 
points are almost always the wrong level to be operating at.


For me the fundamental problem is having char[] in the language 
at all, meaning a Unicode string. Arbitrary slicing and indexing 
are not Unicode compatible, if we revisit this we need a String 
type that doesn't support those operations. Plus the issue of 
string validation - a Unicode string type should be assumed to 
have valid contents - unsafe data should only be checked at 
string construction time, so iterating should always be nothrow.


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread Jonathan M Davis via Digitalmars-d
On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via Digitalmars-d 
wrote:
> On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist
>
> wrote:
> > That way the breaking change was easily fixable, and the
> > mistakes of the past not forever. Is it just the cost of
> > maintenance?
>
> auto-decoding problem was mostly that it couldn't be @nogc since
> throwing, but with further releases exception throwing will get
> @nogc. So it's getting fixed.

I'd actually argue that that's the lesser of the problems with
auto-decoding. The big problem is that it's auto-decoding. Code points are
almost always the wrong level to be operating at. The programmer needs to be
in control of whether the code is operating on code units, code points, or
graphemes, and because of auto-decoding, we have to constantly avoid using
the range primitives for arrays on strings. Tons of range-based code has to
special case for strings in order to work around auto-decoding. We're
constantly fighting our own API in order to process strings sanely and
efficiently.

IMHO, @nogc and nothrow don't matter much in comparison. Yes, it would be
nice if range-based code operating on strings were @nogc and nothrow, but
most D code really doesn't care. It uses the GC anyway, and most of the
time, no exceptions are thrown, because the strings are valid Unicode. Yes,
the fact that the range primitives for strings throw UTFExceptions instead
of using the Unicode replacement character is a problem, but that problem is
small in comparison to the problems caused by the auto-decoding itself. Even
if front and popFront used the variant of decode that used the replacement
character, auto-decoding would still be a huge problem.

- Jonathan M Davis



Re: Why not flag away the mistakes of the past?

2018-03-07 Thread Guillaume Piolat via Digitalmars-d
On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
wrote:
That way the breaking change was easily fixable, and the 
mistakes of the past not forever. Is it just the cost of 
maintenance?


auto-decoding problem was mostly that it couldn't be @nogc since 
throwing, but with further releases exception throwing will get 
@nogc. So it's getting fixed.


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread jmh530 via Digitalmars-d
On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
wrote:
So i've seen on the forum over the years arguments about 
auto-decoding (mostly) and some other things. Things that have 
been considered mistakes, and cannot be corrected because of 
the breaking changes it would create. And I always wonder why 
not make a solution to the tune of a flag that makes things 
work as they used too, and make the new behavior default.


dmd --UseAutoDecoding

That way the breaking change was easily fixable, and the 
mistakes of the past not forever. Is it just the cost of 
maintenance?


That's the approach used for most things, but there's a lot of 
things that rely on auto-decoding, so it would be a big effort to 
actually implement that.


Re: Why not flag away the mistakes of the past?

2018-03-07 Thread FeepingCreature via Digitalmars-d

For what it's worth, I like autodecoding.

I worry we could be in a situation where a moderate number of 
people are strong opponents and a lot of people are weak fans, 
none of which individually care enough to post. Hopefully the D 
survey results will shed some light on this, though I don't 
remember if it was written to actually ask people's opinion of 
autodecoding or just list it as a possible issue to raise, which 
would fall into the same trap.


Why not flag away the mistakes of the past?

2018-03-06 Thread Taylor Hillegeist via Digitalmars-d
So i've seen on the forum over the years arguments about 
auto-decoding (mostly) and some other things. Things that have 
been considered mistakes, and cannot be corrected because of the 
breaking changes it would create. And I always wonder why not 
make a solution to the tune of a flag that makes things work as 
they used too, and make the new behavior default.


dmd --UseAutoDecoding

That way the breaking change was easily fixable, and the mistakes 
of the past not forever. Is it just the cost of maintenance?