Re: More fun with autodecoding

2018-09-16 Thread Nick Sabalausky (Abscissa) via Digitalmars-d

On 09/15/2018 04:29 PM, Jonathan M Davis wrote:


Adding any sort of Concepts feature to D
would be very much at odds with DbI.


I'm not very familiar with C++'s attempted approaches to concepts, so 
maybe we're thinking of two different things by "concepts", but I don't 
see why it would be at odds with DbI. If anything, they would seem to 
compliment each other well, each one filling in where the other hits its 
limit. Even just a simple example:


void foo(ForwardRange r1, InputRange r2)
if(hasLength!r1)
{...}

Andrei is right that a no-DbI version of that would suck: Hierarchies 
are no good for a series of orthogonal options. But at the same time, 
the equivalent current-D code would comparatively be a mess, too.


Although...and maybe I'm just typing out of my &%@ here, maybe some kind 
of templated concept:


void foo(ForwardRange!WithLength r1, InputRange r2)
{...}



Overall though, I don't think that there's really any disagreement that it
would be very desirable to get the compiler to provide better information
about which parts of a template constraint are true and which are false. The
problem is really that someone needs to come up with a scheme to do so that
will work reasonably well and then implement it, and no on has done that
yet.


Agreed, but let's be realistic, this *is* D: How many years has it been 
since assertPred was rejected in favor of the improved-assert-messages 
vaporware? It's hard to have much faith in such a thing happening here 
either. (Not that I'm under any illusion that concept-like stuff would 
be any more likely.)


Though, I'd be glad to be proven wrong either way.

-- Danny Downer


Re: More fun with autodecoding

2018-09-15 Thread Jonathan M Davis via Digitalmars-d
On Saturday, September 15, 2018 9:31:00 AM MDT Steven Schveighoffer via 
Digitalmars-d wrote:
> On 9/13/18 3:53 PM, H. S. Teoh wrote:
> > On Thu, Sep 13, 2018 at 06:32:54PM -0400, Nick Sabalausky (Abscissa) via 
Digitalmars-d wrote:
> >> On 09/11/2018 09:06 AM, Steven Schveighoffer wrote:
> >>> Then I found the true culprit was isForwardRange!R. This led me to
> >>> requestion my sanity, and finally realized I forgot the empty
> >>> function.
> >>
> >> This is one reason template-based interfaces like ranges should be
> >> required to declare themselves as deliberately implementing said
> >> interface. Sure, we can tell people they should always `static
> >> assert(isForwardRage!MyType)`, but that's coding by convention and
> >> clearly isn't always going to happen.
>
> No, please don't. I've used C# and Swift, and this sucks compared to
> duck typing.
>
> > Yeah, I find myself writing `static assert(isInputRange!MyType)` all the
> > time these days, because you just never can be too sure you didn't screw
> > up and cause things to mysteriously fail, even though they shouldn't.
> >
> > Although I used to be a supporter of free-form sig constraints (and
> > still am to some extent) and a hater of Concepts like in C++, more and
> > more I'm beginning to realize the wisdom of Concepts rather than
> > free-for-all ducktyping.  It's one of those things that work well in
> > small programs and fast, one-shot projects, but don't generalize so well
> > as you scale up to larger and larger projects.
>
> The problem I had was that it wasn't clear to me which constraint was
> failing. My bias brought me to "it must be autodecoding again!". But
> objectively, I should have examined all the constraints to see what was
> wrong. All C++ concepts seem to do (haven't used them) is help identify
> easier which requirements are failing.
>
> We can fix all these problems by simply identifying the constraint
> clauses that fail. By color coding the error message identifying which
> ones are true and which are false, we can pinpoint the error without
> changing the language.
>
> Once you fix the issue, it doesn't error any more, so the idea of duck
> typing and constraints is sound, it's just difficult to diagnose.

The other two things that come to mind are that

1. Design by Introspection is pretty much the opposite of Concepts, and
while I'm not convinced that DbI is a great idea in general, there clearly
are cases where it makes a lot of sense (e.g. allocators), and it's
something that Andrei wants to push (whereas unless something has changed,
he's very much against Concepts). Adding any sort of Concepts feature to D
would be very much at odds with DbI. And honestly, in general, I don't think
that it's at all necessary. As you point out, it's really the error
reporting that's the problem. Aside from that, template constraints tend to
work quite well.

2. Improving the error reporting for constraints improves templates in
general and not just those that use traits like isInputRange. While we do
create traits for the really common stuff, there's plenty of code that is
going to do stuff like is(typeof(...)), because it's a one-off thing, and it
would be overkill to create a trait for it. So, improving the error
reporting would ultimately be very useful in general, whereas trying to do
something with Concepts would only help with part of the problem.

And of course, there's always going with Atila's approach of providing a
separate template that goes with the trait and tells you which piece fails
for a particular template argument (though that obviously doesn't scale).

Overall though, I don't think that there's really any disagreement that it
would be very desirable to get the compiler to provide better information
about which parts of a template constraint are true and which are false. The
problem is really that someone needs to come up with a scheme to do so that
will work reasonably well and then implement it, and no on has done that
yet.

- Jonathan M Davis





Re: More fun with autodecoding

2018-09-15 Thread Steven Schveighoffer via Digitalmars-d

On 9/15/18 12:04 PM, Neia Neutuladh wrote:

On Saturday, 15 September 2018 at 15:31:00 UTC, Steven Schveighoffer wrote:
The problem I had was that it wasn't clear to me which constraint was 
failing. My bias brought me to "it must be autodecoding again!". But 
objectively, I should have examined all the constraints to see what 
was wrong. All C++ concepts seem to do (haven't used them) is help 
identify easier which requirements are failing.


They also make it so your automated documentation can post a link to 
something that describes the type in more cases. std.algorithm would 
still be relatively horked, but a lot of functions could be declared as 
yielding, for instance, ForwardRange!(ElementType!(TRange)).


True, we currently rely on convention there. But this really is simply 
documentation at a different (admittedly more verified) level.




We can fix all these problems by simply identifying the constraint 
clauses that fail. By color coding the error message identifying which 
ones are true and which are false, we can pinpoint the error without 
changing the language.


I wish. I had a look at std.algorithm.searching.canFind as the first 
thing I thought to check. Its constraints are of the form:


     bool canFind(Range)(Range haystack)
     if (is(typeof(find!pred(haystack

The compiler can helpfully point out that the specific constraint that 
failed was is(...), which does absolutely no good in trying to track 
down the problem.


is(typeof(...)) constraints might be useless here, but we have started 
to move away from such things in general (see for instance isInputRange 
and friends).


But there could actually be a solution -- just recursively play out the 
items at compile time (probably with the verbose switch) to see what 
underlying cause there is.


Other than that, you can then write find(myrange) and see what comes up.

In my case even, the problem was hasSlicing, which itself is a 
complicated template, and wouldn't have helped me diagnose the real 
problem. A recursive display of what things failed would help, but even 
if I could trigger a way to diagnose hasSlicing, instead of copying all 
the constraints locally, it's still a much better situation.


I'm really thinking of exploring how this could play out, just toying 
with the compiler to do this would give me experience in how the thing 
works.


-Steve


Re: More fun with autodecoding

2018-09-15 Thread Neia Neutuladh via Digitalmars-d
On Saturday, 15 September 2018 at 15:31:00 UTC, Steven 
Schveighoffer wrote:
The problem I had was that it wasn't clear to me which 
constraint was failing. My bias brought me to "it must be 
autodecoding again!". But objectively, I should have examined 
all the constraints to see what was wrong. All C++ concepts 
seem to do (haven't used them) is help identify easier which 
requirements are failing.


They also make it so your automated documentation can post a link 
to something that describes the type in more cases. std.algorithm 
would still be relatively horked, but a lot of functions could be 
declared as yielding, for instance, 
ForwardRange!(ElementType!(TRange)).


We can fix all these problems by simply identifying the 
constraint clauses that fail. By color coding the error message 
identifying which ones are true and which are false, we can 
pinpoint the error without changing the language.


I wish. I had a look at std.algorithm.searching.canFind as the 
first thing I thought to check. Its constraints are of the form:


bool canFind(Range)(Range haystack)
if (is(typeof(find!pred(haystack

The compiler can helpfully point out that the specific constraint 
that failed was is(...), which does absolutely no good in trying 
to track down the problem.


Re: More fun with autodecoding

2018-09-15 Thread Steven Schveighoffer via Digitalmars-d

On 9/13/18 3:53 PM, H. S. Teoh wrote:

On Thu, Sep 13, 2018 at 06:32:54PM -0400, Nick Sabalausky (Abscissa) via 
Digitalmars-d wrote:

On 09/11/2018 09:06 AM, Steven Schveighoffer wrote:


Then I found the true culprit was isForwardRange!R. This led me to
requestion my sanity, and finally realized I forgot the empty
function.


This is one reason template-based interfaces like ranges should be
required to declare themselves as deliberately implementing said
interface. Sure, we can tell people they should always `static
assert(isForwardRage!MyType)`, but that's coding by convention and
clearly isn't always going to happen.


No, please don't. I've used C# and Swift, and this sucks compared to 
duck typing.



Yeah, I find myself writing `static assert(isInputRange!MyType)` all the
time these days, because you just never can be too sure you didn't screw
up and cause things to mysteriously fail, even though they shouldn't.

Although I used to be a supporter of free-form sig constraints (and
still am to some extent) and a hater of Concepts like in C++, more and
more I'm beginning to realize the wisdom of Concepts rather than
free-for-all ducktyping.  It's one of those things that work well in
small programs and fast, one-shot projects, but don't generalize so well
as you scale up to larger and larger projects.


The problem I had was that it wasn't clear to me which constraint was 
failing. My bias brought me to "it must be autodecoding again!". But 
objectively, I should have examined all the constraints to see what was 
wrong. All C++ concepts seem to do (haven't used them) is help identify 
easier which requirements are failing.


We can fix all these problems by simply identifying the constraint 
clauses that fail. By color coding the error message identifying which 
ones are true and which are false, we can pinpoint the error without 
changing the language.


Once you fix the issue, it doesn't error any more, so the idea of duck 
typing and constraints is sound, it's just difficult to diagnose.


-Steve


Re: More fun with autodecoding

2018-09-13 Thread H. S. Teoh via Digitalmars-d
On Thu, Sep 13, 2018 at 06:32:54PM -0400, Nick Sabalausky (Abscissa) via 
Digitalmars-d wrote:
> On 09/11/2018 09:06 AM, Steven Schveighoffer wrote:
> > 
> > Then I found the true culprit was isForwardRange!R. This led me to
> > requestion my sanity, and finally realized I forgot the empty
> > function.
> 
> This is one reason template-based interfaces like ranges should be
> required to declare themselves as deliberately implementing said
> interface. Sure, we can tell people they should always `static
> assert(isForwardRage!MyType)`, but that's coding by convention and
> clearly isn't always going to happen.

Yeah, I find myself writing `static assert(isInputRange!MyType)` all the
time these days, because you just never can be too sure you didn't screw
up and cause things to mysteriously fail, even though they shouldn't.

Although I used to be a supporter of free-form sig constraints (and
still am to some extent) and a hater of Concepts like in C++, more and
more I'm beginning to realize the wisdom of Concepts rather than
free-for-all ducktyping.  It's one of those things that work well in
small programs and fast, one-shot projects, but don't generalize so well
as you scale up to larger and larger projects.


T

-- 
A program should be written to model the concepts of the task it performs 
rather than the physical world or a process because this maximizes the 
potential for it to be applied to tasks that are conceptually similar and, more 
important, to tasks that have not yet been conceived. -- Michael B. Allen


Re: More fun with autodecoding

2018-09-13 Thread Nick Sabalausky (Abscissa) via Digitalmars-d

On 09/11/2018 09:06 AM, Steven Schveighoffer wrote:


Then I found the true culprit was isForwardRange!R. This led me to 
requestion my sanity, and finally realized I forgot the empty function.


This is one reason template-based interfaces like ranges should be 
required to declare themselves as deliberately implementing said 
interface. Sure, we can tell people they should always `static 
assert(isForwardRage!MyType)`, but that's coding by convention and 
clearly isn't always going to happen.


Re: More fun with autodecoding

2018-09-12 Thread jmh530 via Digitalmars-d
On Wednesday, 12 September 2018 at 12:45:15 UTC, Nicholas Wilson 
wrote:



Overloads:

[snip]


Good point.


Re: More fun with autodecoding

2018-09-12 Thread Nicholas Wilson via Digitalmars-d

On Tuesday, 11 September 2018 at 14:58:21 UTC, jmh530 wrote:

Is there any reason why this is not sufficient?

[1] https://run.dlang.io/is/lu6nQ0


Overloads:

https://run.dlang.io/is/m5HGOh

The static asserts being in the constraint affects the template 
candidacy viability. Being in the function body/runtime contract 
does not so you'll end up with


onlineapp.d(17): Error: onlineapp.foo called with argument types 
(float) matches both:

onlineapp.d(1): onlineapp.foo!float.foo(float x)
and:
onlineapp.d(7): onlineapp.foo!float.foo(float x)

despite the fact only one of them is viable, whereas bar is fine.


Re: More fun with autodecoding

2018-09-12 Thread Steven Schveighoffer via Digitalmars-d

On 9/11/18 7:58 AM, jmh530 wrote:


Is there any reason why this is not sufficient?

[1] https://run.dlang.io/is/lu6nQ0


That's OK if you are the only one defining S. But what if float is 
handled elsewhere?


-Steve


Re: More fun with autodecoding

2018-09-11 Thread jmh530 via Digitalmars-d
On Tuesday, 11 September 2018 at 02:00:29 UTC, Nicholas Wilson 
wrote:

[snip]

https://github.com/dlang/DIPs/pull/131 will help narrow down 
the cause.


I like it, but I worry people would find multiple ifs confusing.

The first line of the comment is about using static asserts and 
in contracts, but it looks like static asserts are allowed in in 
contracts for functions [1]. You can do the same thing in 
structs/classes with invariant blocks (but in contracts are not 
allowed). So basically, the same behavior for if can be reduced 
to in contracts with static asserts already. Multiple ifs would 
just be a slightly less verbose way to accomplish the same thing.


I suppose one issue might be that contracts are not compiled in 
during release mode, but I think release only impacts normal 
asserts, not static asserts.


Is there any reason why this is not sufficient?

[1] https://run.dlang.io/is/lu6nQ0


Re: More fun with autodecoding

2018-09-11 Thread Nicholas Wilson via Digitalmars-d
On Tuesday, 11 September 2018 at 13:08:46 UTC, Steven 
Schveighoffer wrote:

On 9/10/18 7:00 PM, Nicholas Wilson wrote:
On Monday, 10 September 2018 at 20:44:46 UTC, Andrei 
Alexandrescu wrote:

On 9/10/18 12:46 PM, Steven Schveighoffer wrote:

On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
I'll have to figure out why my specialized range doesn't 
allow splitting based on " ".


And the answer is: I'm an idiot. Forgot to define empty :) 
Also my slicing operator accepted ints and not size_t.


I guess a better error message would be in order.


https://github.com/dlang/DIPs/pull/131 will help narrow down 
the cause.


While this would help eventually, I'd prefer something that 
just transforms all the existing code into useful error 
messages. See my response to Andrei.


-Steve


Please tell me where to get one of those!

But yeah, that DIP will tell you that has slicing is you problem 
straight away. Extracting useful information to present to the 
user on why hasSlicing!R is false is much trickier for the same 
reason that providing useful information in the current template 
constraint format is hard: it is a bunch of potentially 
unstructured logic that has already been const-folded in order to 
evaluate it in the first place, so you can't re-evaluate it 
without flushing the template cache.


That's not to say that the situation can't be improved beyond 
what the DIP specifies, but I haven't had any brilliant ideas 
(and the Idea for that DIP was stolen from someone else anyway).


Re: More fun with autodecoding

2018-09-11 Thread Steven Schveighoffer via Digitalmars-d

On 9/10/18 7:00 PM, Nicholas Wilson wrote:

On Monday, 10 September 2018 at 20:44:46 UTC, Andrei Alexandrescu wrote:

On 9/10/18 12:46 PM, Steven Schveighoffer wrote:

On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
I'll have to figure out why my specialized range doesn't allow 
splitting based on " ".


And the answer is: I'm an idiot. Forgot to define empty :) Also my 
slicing operator accepted ints and not size_t.


I guess a better error message would be in order.


https://github.com/dlang/DIPs/pull/131 will help narrow down the cause.


While this would help eventually, I'd prefer something that just 
transforms all the existing code into useful error messages. See my 
response to Andrei.


-Steve


Re: More fun with autodecoding

2018-09-11 Thread Steven Schveighoffer via Digitalmars-d

On 9/10/18 1:44 PM, Andrei Alexandrescu wrote:

On 9/10/18 12:46 PM, Steven Schveighoffer wrote:

On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
I'll have to figure out why my specialized range doesn't allow 
splitting based on " ".


And the answer is: I'm an idiot. Forgot to define empty :) Also my 
slicing operator accepted ints and not size_t.


I guess a better error message would be in order.



A better error message would help prevent the painful diagnosis that I 
had to do to actually find the issue.


So the error I got was this:

source/bufref.d(346,36): Error: template 
std.algorithm.iteration.splitter cannot deduce function from argument 
types !()(Result, string), candidates are:
/Users/steves/.dvm/compilers/dmd-2.081.0/osx/bin/../../src/phobos/std/algorithm/iteration.d(3792,6): 
   std.algorithm.iteration.splitter(alias pred = "a == b", Range, 
Separator)(Range r, Separator s) if (is(typeof(binaryFun!pred(r.front, 
s)) : bool) && (hasSlicing!Range && hasLength!Range || 
isNarrowString!Range))
/Users/steves/.dvm/compilers/dmd-2.081.0/osx/bin/../../src/phobos/std/algorithm/iteration.d(4163,6): 
   std.algorithm.iteration.splitter(alias pred = "a == b", Range, 
Separator)(Range r, Separator s) if (is(typeof(binaryFun!pred(r.front, 
s.front)) : bool) && (hasSlicing!Range || isNarrowString!Range) && 
isForwardRange!Separator && (hasLength!Separator || 
isNarrowString!Separator))
/Users/steves/.dvm/compilers/dmd-2.081.0/osx/bin/../../src/phobos/std/algorithm/iteration.d(4350,6): 
   std.algorithm.iteration.splitter(alias isTerminator, 
Range)(Range r) if (isForwardRange!Range && 
is(typeof(unaryFun!isTerminator(r.front
/Users/steves/.dvm/compilers/dmd-2.081.0/osx/bin/../../src/phobos/std/algorithm/iteration.d(4573,6): 
   std.algorithm.iteration.splitter(C)(C[] s) if (isSomeChar!C)


This means I had to look at each line, figure out which overload I'm 
calling, and then copy all the constraints locally, seeing which ones 
were true and which ones false.


But it didn't stop there. The problem was hasSlicing!Range. If you look 
at hasSlicing, it looks like this:


enum bool hasSlicing(R) = isForwardRange!R
&& !isNarrowString!R
&& is(ReturnType!((R r) => r[1 .. 1].length) == size_t)
&& (is(typeof(lvalueOf!R[1 .. 1]) == R) || isInfinite!R)
&& (!is(typeof(lvalueOf!R[0 .. $])) || is(typeof(lvalueOf!R[0 .. 
$]) == R))

&& (!is(typeof(lvalueOf!R[0 .. $])) || isInfinite!R
|| is(typeof(lvalueOf!R[0 .. $ - 1]) == R))
&& is(typeof((ref R r)
{
static assert(isForwardRange!(typeof(r[1 .. 2])));
}));

Now I had to instrument a whole slew of items. I pasted this whole thing 
this into my code, added an alias to my range type for R, and then 
changed the big boolean expression to a bunch of static asserts.


Then I found the true culprit was isForwardRange!R. This led me to 
requestion my sanity, and finally realized I forgot the empty function.


A fabulous fantastic mechanism that would have saved me some time is 
simply coloring the clauses of the template constraint that failed red, 
the ones that passed green, and the ones that weren't evaluated grey.


Furthermore, it would be good to either recursively continue this for 
red clauses like `hasSlicing` which have so much underneath. Either that 
or a way to trigger the colored evaluation on demand.


If I were a dmd guru, I'd look at doing this myself. I may still try and 
hack it in just to see if I can do it.


--

Finally, there is a possible bug in the definition of hasSlicing: it 
doesn't require the slice parameters be size_t, but there are places 
(e.g. inside std.algorithm.searching.find) that pass in range.length .. 
range.length for slicing the range. In my implementation I had used ints 
as the parameters for opSlice. So I started seeing errors deep inside 
std.algorithm saying there was no overload for slicing. Again the sanity 
was questioned, and I figured out the error and now it's actually working.


-Steve


Re: More fun with autodecoding

2018-09-10 Thread Nicholas Wilson via Digitalmars-d
On Monday, 10 September 2018 at 20:44:46 UTC, Andrei Alexandrescu 
wrote:

On 9/10/18 12:46 PM, Steven Schveighoffer wrote:

On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
I'll have to figure out why my specialized range doesn't 
allow splitting based on " ".


And the answer is: I'm an idiot. Forgot to define empty :) 
Also my slicing operator accepted ints and not size_t.


I guess a better error message would be in order.


https://github.com/dlang/DIPs/pull/131 will help narrow down the 
cause.


Re: More fun with autodecoding

2018-09-10 Thread Andrei Alexandrescu via Digitalmars-d

On 9/10/18 12:46 PM, Steven Schveighoffer wrote:

On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
I'll have to figure out why my specialized range doesn't allow 
splitting based on " ".


And the answer is: I'm an idiot. Forgot to define empty :) Also my 
slicing operator accepted ints and not size_t.


I guess a better error message would be in order.



Re: More fun with autodecoding

2018-09-10 Thread Steven Schveighoffer via Digitalmars-d

On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
I'll have to figure out why my specialized range doesn't allow splitting 
based on " ".


And the answer is: I'm an idiot. Forgot to define empty :) Also my 
slicing operator accepted ints and not size_t.


-Steve



Re: More fun with autodecoding

2018-09-10 Thread Steven Schveighoffer via Digitalmars-d

On 9/8/18 8:36 AM, Steven Schveighoffer wrote:

On 8/9/18 2:44 AM, Walter Bright wrote:

On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
Here's where I'm struggling -- because a string provides indexing, 
slicing, length, etc. but Phobos ignores that. I can't make a new 
type that does the same thing. Not only that, but I'm finding the 
specializations of algorithms only work on the type "string", and 
nothing else.


One of the worst things about autodecoding is it is special, it *only* 
steps in for strings. Fortunately, however, that specialness enabled 
us to save things with byCodePoint and byCodeUnit.


So it turns out that technically the problem here, even though it seemed 
like an autodecoding problem, is a problem with splitter.


splitter doesn't deal with encodings of character ranges at all.

For instance, when you have this:

"abc 123".byCodeUnit.splitter;

What happens is splitter only has one overload that takes one parameter, 
and that requires a character *array*, not a range.


So the byCodeUnit result is aliased-this to its original, and surprise! 
the elements from that splitter are string.


Next, I tried to use a parameter:

"abc 123".byCodeUnit.splitter(" ");

Nope, still devolves to string. It turns out it can't figure out how to 
split character ranges using a character array as input.


Hm... I made some erroneous assumptions in determining these problems.

1. There is no alias this for the source in ByCodeUnitImpl. I'm not sure 
how it was working when I tested before, but byCodeUnit definitely 
doesn't have it, and doesn't compile with the no-arg splitter call.
2. The .splitter(" ") does actually work and return a range of 
ByCodeUnitImpl elements.


So some of my analysis must have been based on bad testing.

However, the issue with the no-arg splitter is still there, and I still 
think it should be fixed.


I'll have to figure out why my specialized range doesn't allow splitting 
based on " ".


-Steve


Re: More fun with autodecoding

2018-09-10 Thread Steven Schveighoffer via Digitalmars-d

On 9/8/18 8:36 AM, Steven Schveighoffer wrote:
I'll work on adding some issues to the tracker, and potentially doing 
some PRs so they can be fixed.


https://issues.dlang.org/show_bug.cgi?id=19238
https://github.com/dlang/phobos/pull/6700

-Steve



Re: More fun with autodecoding

2018-09-10 Thread Steven Schveighoffer via Digitalmars-d

On 9/10/18 1:45 AM, Chris wrote:

After a while your code will be cluttered with absurd stuff like this. 
`.byCodeUnit`, `.byGrapheme`, `.array` etc. Due to my experience with 
`splitter` et. al. I tried to create my own parser to have better 
control over every step.


I considered that, but I'm still trying to make this buffer reference 
thing work. Phobos just needs to be fixed. This is actually not as 
hopeless as I once thought. But what needs to happen is all of Phobos 
algorithms need to be tested with byCodeUnit et. al.


After a few *minutes* of testing things I ran 
into this bug [1] that didn't get fixed till early 2018. I never started 
to write my own step-by-step parser. I'm glad I didn't.


It actually was fixed accidentally in 2017 in this PR: 
https://github.com/dlang/druntime/pull/1952. The bug was closed in 2018 
when someone noticed the code no longer failed.


Essentially, the whole string switch algorithm was replaced with a 
completely rewritten better approach. This is a great example of why we 
should be moving more of the compiler magic into the library -- it's 
just easier to write and understand there.


I wish people began to realize that string handling is a basic necessity 
and that the correct handling of strings is of utmost importance. Please 
keep us updated on how things work out (or not) for you.


Absolutely, D needs to have great support for string parsing and 
manipulation. The potential is awesome.


I will keep it up, what I'm trying to fix is the fact that using 
std.algorithm to extract pieces from a buffer, but then using the 
position in that buffer to determine things (i.e. parsing) is really 
difficult without some stupid requirements like pointer math.


[Please, nobody answer my post pointing out that a) we don't understand 
Unicode and b) that it's an insult to the Universe to draw attention to 
flaws that keep pestering us on an almost daily basis - without trying 
to fix them ourselves stante pede. As is clear from Steve's efforts, the 
Universe doesn't seem to care.)


I don't characterize it as the universe not caring. Phobos has a legacy 
problem with string handling, and it needs to somehow be addressed -- 
either by painfully extracting the problem, or painfully working around 
it. I don't think anyone here thinks there isn't a problem or that it's 
insulting to bring it up. But anything that needs to be done is painful 
either way, which is why it's not happening very fast.


-Steve


Re: More fun with autodecoding

2018-09-10 Thread Jonathan M Davis via Digitalmars-d
On Monday, September 10, 2018 2:45:27 AM MDT Chris via Digitalmars-d wrote:

> After a while your code will be cluttered with absurd stuff like
> this. `.byCodeUnit`, `.byGrapheme`, `.array` etc. Due to my
> experience with `splitter` et. al. I tried to create my own
> parser to have better control over every step. After a few
> *minutes* of testing things I ran into this bug [1] that didn't
> get fixed till early 2018. I never started to write my own
> step-by-step parser. I'm glad I didn't.
>
> [1] https://issues.dlang.org/show_bug.cgi?id=16739
>
> [snip]

I suspect that that that didn't get found sooner simply because using
Unicode in a switch statement is rare. Usually, Unicode characters are found
in program input and not in the program itself. And grammars typically only
involve ASCII characters (even D, which supports Unicode characters in
identfiers, doesn't have any Unicode in any of its symbols). So, while I
completely agree that using Unicode in switch statements should work, it
doesn't really surprise me that it was broken. That's really a large part of
the Unicode problem. Regardless of how a particular language or library
attempst to make using Unicode sane, a large percentage of programmers don't
ever do anything with Unicode characters (even if their programs are often
used in environments where they will end up processing Unicode characters),
and even when a programmer's native tongue requires Unicode characters,
their programs frequently do not. So, it becomes very easy to write code
that doesn't work properly with Unicode and have no clue that it doesn't.

Fortunately, D does provide better tools than many languages for handling
Unicode, but the auto-decoding mess has made it considerably worse.

Still, even if we'd gotten it right, some portion of the code out there have
to have something like byCodeUnit, byCodePoint, or byGrapheme, because
efficient Unicode processing requires that you deal with all of that mess.
The code that doesn't have to do any of that is generally code that treats
strings as opaque data. Once you actually have to do string processing,
you're pretty much screwed.

Doing everything at the grapheme level would eliminate most of the problems
with regards to user-friendliness, but it would kill efficiency. So, as far
as I can tell, there really isn't a great solution to be had. Unicode is
simply too complicated and messy by its very nature. Now, we've definitely
made mistakes with Phobos that make it worse, but the only programs that are
going to avoid this whole mess either do so by not dealing with Unicode,
handling it incorrectly, or by handling it inefficiently. I think that it's
pretty much a pipe dream to be able to have completely sane and efficient
string handling using Unicode as its currently defined.

Regardless, we need to do a better job of it in D than we have been.

- Jonathan M Davis





Re: More fun with autodecoding

2018-09-10 Thread Chris via Digitalmars-d
On Saturday, 8 September 2018 at 15:36:25 UTC, Steven 
Schveighoffer wrote:

On 8/9/18 2:44 AM, Walter Bright wrote:




So it turns out that technically the problem here, even though 
it seemed like an autodecoding problem, is a problem with 
splitter.


splitter doesn't deal with encodings of character ranges at all.

For instance, when you have this:

"abc 123".byCodeUnit.splitter;

What happens is splitter only has one overload that takes one 
parameter, and that requires a character *array*, not a range.


So the byCodeUnit result is aliased-this to its original, and 
surprise! the elements from that splitter are string.


Next, I tried to use a parameter:

"abc 123".byCodeUnit.splitter(" ");

Nope, still devolves to string. It turns out it can't figure 
out how to split character ranges using a character array as 
input.


The only thing that does seem to work is this:

"abc 123".byCodeUnit.splitter(" ".byCodeUnit);



After a while your code will be cluttered with absurd stuff like 
this. `.byCodeUnit`, `.byGrapheme`, `.array` etc. Due to my 
experience with `splitter` et. al. I tried to create my own 
parser to have better control over every step. After a few 
*minutes* of testing things I ran into this bug [1] that didn't 
get fixed till early 2018. I never started to write my own 
step-by-step parser. I'm glad I didn't.


I wish people began to realize that string handling is a basic 
necessity and that the correct handling of strings is of utmost 
importance. Please keep us updated on how things work out (or 
not) for you.


[Please, nobody answer my post pointing out that a) we don't 
understand Unicode and b) that it's an insult to the Universe to 
draw attention to flaws that keep pestering us on an almost daily 
basis - without trying to fix them ourselves stante pede. As is 
clear from Steve's efforts, the Universe doesn't seem to care.)


[1] https://issues.dlang.org/show_bug.cgi?id=16739

[snip]


Re: More fun with autodecoding

2018-09-10 Thread Jonathan M Davis via Digitalmars-d
On Saturday, September 8, 2018 9:36:25 AM MDT Steven Schveighoffer via 
Digitalmars-d wrote:
> On 8/9/18 2:44 AM, Walter Bright wrote:
> > On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
> >> Here's where I'm struggling -- because a string provides indexing,
> >> slicing, length, etc. but Phobos ignores that. I can't make a new type
> >> that does the same thing. Not only that, but I'm finding the
> >> specializations of algorithms only work on the type "string", and
> >> nothing else.
> >
> > One of the worst things about autodecoding is it is special, it *only*
> > steps in for strings. Fortunately, however, that specialness enabled us
> > to save things with byCodePoint and byCodeUnit.
>
> So it turns out that technically the problem here, even though it seemed
> like an autodecoding problem, is a problem with splitter.
>
> splitter doesn't deal with encodings of character ranges at all.
>
> For instance, when you have this:
>
> "abc 123".byCodeUnit.splitter;
>
> What happens is splitter only has one overload that takes one parameter,
> and that requires a character *array*, not a range.
>
> So the byCodeUnit result is aliased-this to its original, and surprise!
> the elements from that splitter are string.
>
> Next, I tried to use a parameter:
>
> "abc 123".byCodeUnit.splitter(" ");
>
> Nope, still devolves to string. It turns out it can't figure out how to
> split character ranges using a character array as input.
>
> The only thing that does seem to work is this:
>
> "abc 123".byCodeUnit.splitter(" ".byCodeUnit);
>
> But this goes against most algorithms in Phobos that deal with character
> ranges -- generally you can use any width character range, and it just
> works. Having a drop-in replacement for string would require splitter to
> handle these transcodings (and I think in general, algorithms should be
> able to handle them as well). Not only that, but the specialized
> splitter that takes no separator can split on multiple spaces, a feature
> I want to have for my drop-in replacement.
>
> I'll work on adding some issues to the tracker, and potentially doing
> some PRs so they can be fixed.

Well, plenty of algorithms don't care one whit about strings specifically
and thus their behavior is really dependent on what the element type of the
range is (e.g. for byCodeUnit, filter would filter code units, and sort
would sort code units, and arguably, that's what they should do). However, a
big problem with with a number of the functions in Phobos that specifically
operate on ranges of characters is that they tend to assume that a range of
characters means a range of dchar. Some of the functions in Phobos have been
fixed to be more flexible and operate on arbitrary ranges of char, wchar, or
dchar, but it's mostly happened because of a bug report about a particular
function not working with something like byCodeUnit, whereas what we really
need to happen is to have tests added for all of the functions in Phobos
which specifically operate on ranges of characters to ensure that they do
the correct thing when given a range of char, wchar, dchar - or graphemes
(much as we talk about graphemes being the correct level for a some types of
string processing, nothing in Phobos outside of std.uni currently does
anything with byGrapheme, even in tests).

And of course, with those tests, we'll inevitably find that a number of
those functions won't work correctly and will need to be fixed. But as
annoying as all of that is, it's work that needs to be done regardless of
the situation with auto-decoding, since these functions need to work with
arbitrary ranges of characters and not just ranges of dchar. And for those
functions that don't need to try to avoid auto-decoding, they should then
not even care whether strings are ranges of code units or code points, which
should then reduce the impact of auto-decoding. And actually, a lot of the
code that specializes on narrow strings to avoid auto-decoding would
probably work whether auto-decoding was there or not. So, once we've
actually managed to ensure that Phobos in general works with arbitrary
ranges of characters, the main breakage that would be caused by removing
auto-decoding (in Phobos at least) would be any code that used strings with
functions that weren't specifically written to do something special for
strings, and while I'm not at all convinced that we then have a path towards
removing auto-decoding, it would minimize auto-decoding's impact, and with
auto-decoding's impact minimized as much as possible, maybe at some point,
we'll actually manage to figure out how to remove it.

But in any case, the issues that you're running into with splitter are a
symptom of a larger problem with how Phobos currently handles ranges of
characters. And when this sort of thing comes up, I'm reminded that I should
take the time to start adding the appropriate tests to Phobos, and then I
never get around to it - as with too many things. I really should fix that.
:|

- Jonathan 

Re: More fun with autodecoding

2018-09-09 Thread Jon Degenhardt via Digitalmars-d
On Saturday, 8 September 2018 at 15:36:25 UTC, Steven 
Schveighoffer wrote:

On 8/9/18 2:44 AM, Walter Bright wrote:

On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
Here's where I'm struggling -- because a string provides 
indexing, slicing, length, etc. but Phobos ignores that. I 
can't make a new type that does the same thing. Not only 
that, but I'm finding the specializations of algorithms only 
work on the type "string", and nothing else.


One of the worst things about autodecoding is it is special, 
it *only* steps in for strings. Fortunately, however, that 
specialness enabled us to save things with byCodePoint and 
byCodeUnit.


So it turns out that technically the problem here, even though 
it seemed like an autodecoding problem, is a problem with 
splitter.


splitter doesn't deal with encodings of character ranges at all.


This could partially explain why when I tried byCodeUnit and 
friends awhile ago I concluded it wasn't a reasonable approach: 
splitter is in the middle of much of what I've written.


Even if splitter is changed I'll still be very doubtful about the 
byCodeUnit approach as a work-around. An automated way to 
validate that it is engaged only when necessary would be very 
helpful (@noautodecode perhaps? :))


--Jon



Re: More fun with autodecoding

2018-09-09 Thread Steven Schveighoffer via Digitalmars-d

On 8/9/18 2:44 AM, Walter Bright wrote:

On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
Here's where I'm struggling -- because a string provides indexing, 
slicing, length, etc. but Phobos ignores that. I can't make a new type 
that does the same thing. Not only that, but I'm finding the 
specializations of algorithms only work on the type "string", and 
nothing else.


One of the worst things about autodecoding is it is special, it *only* 
steps in for strings. Fortunately, however, that specialness enabled us 
to save things with byCodePoint and byCodeUnit.


So it turns out that technically the problem here, even though it seemed 
like an autodecoding problem, is a problem with splitter.


splitter doesn't deal with encodings of character ranges at all.

For instance, when you have this:

"abc 123".byCodeUnit.splitter;

What happens is splitter only has one overload that takes one parameter, 
and that requires a character *array*, not a range.


So the byCodeUnit result is aliased-this to its original, and surprise! 
the elements from that splitter are string.


Next, I tried to use a parameter:

"abc 123".byCodeUnit.splitter(" ");

Nope, still devolves to string. It turns out it can't figure out how to 
split character ranges using a character array as input.


The only thing that does seem to work is this:

"abc 123".byCodeUnit.splitter(" ".byCodeUnit);

But this goes against most algorithms in Phobos that deal with character 
ranges -- generally you can use any width character range, and it just 
works. Having a drop-in replacement for string would require splitter to 
handle these transcodings (and I think in general, algorithms should be 
able to handle them as well). Not only that, but the specialized 
splitter that takes no separator can split on multiple spaces, a feature 
I want to have for my drop-in replacement.


I'll work on adding some issues to the tracker, and potentially doing 
some PRs so they can be fixed.


-Steve


Re: More fun with autodecoding

2018-09-09 Thread Steven Schveighoffer via Digitalmars-d

On 9/8/18 8:36 AM, Steven Schveighoffer wrote:


Sent this when I was on a plane, and for some reason it posted with the 
timestamp when I hit "send later", not when I connected just now. So 
this is to bring the previous message back to the forefront.


-Steve


Re: More fun with autodecoding

2018-08-09 Thread Jon Degenhardt via Digitalmars-d
On Wednesday, 8 August 2018 at 21:01:18 UTC, Steven Schveighoffer 
wrote:
Not trying to give too much away about the library I'm writing, 
but the problem I'm trying to solve is parsing out tokens from 
a buffer. I want to delineate the whole, as well as the parts, 
but it's difficult to get back to the original buffer once you 
split and slice up the buffer using phobos functions.


I wonder if there are some parallels in the tsv utilities I 
wrote. The tsv parser is extremely simple, byLine and splitter on 
a char buffer. Most of the tools just iterate the split result in 
order, but a couple do things like operate on a subset of fields, 
potentially reordered. For these a separate structure is created 
that maps back the to original buffer to avoid copying. Likely 
quite simple compared to what you are doing.


The csv2tsv tool may be more interesting. Parsing is relatively 
simple, mostly identifying field values in the context of CSV 
escape syntax. It's modeled as reading an infinite stream of 
utf-8 characters, byte-by-byte. Occasionally the bytes forming 
the value need to be modified due to the escape syntax, but most 
of the time the characters in the original buffer remain 
untouched and parsing is identifying the start and end positions.


The infinite stream is constructed by reading fixed size blocks 
from the input stream and concatenating them with joiner. This 
eliminates the need to worry about utf-8 characters spanning 
block boundaries, but it comes at a cost: either write 
byte-at-a-time, or make an extra copy (also byte-at-a-time). 
Making an extra copy is faster, that what the code does. But, as 
a practical matter, most of the time large blocks could often be 
written directly from the original input buffer.


If I wanted it make it faster than current I'd do this. But I 
don't see an easy way to do this with phobos ranges. At minimum 
I'd have to be able to run code when the joiner operation hits 
block boundaries. And it'd also be necessary to create a mapping 
back to the original input buffer.


Autodecoding comes into play of course. Basically, splitter on 
char arrays is fine, but in a number of cases it's necessary to 
work using ubtye to avoid the performance penalty.


--Jon


Re: More fun with autodecoding

2018-08-08 Thread Walter Bright via Digitalmars-d

On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
Here's where I'm struggling -- because a string provides indexing, slicing, 
length, etc. but Phobos ignores that. I can't make a new type that does the same 
thing. Not only that, but I'm finding the specializations of algorithms only 
work on the type "string", and nothing else.


One of the worst things about autodecoding is it is special, it *only* steps in 
for strings. Fortunately, however, that specialness enabled us to save things 
with byCodePoint and byCodeUnit.


Re: More fun with autodecoding

2018-08-08 Thread Steven Schveighoffer via Digitalmars-d

On 8/8/18 4:13 PM, Walter Bright wrote:

On 8/6/2018 6:57 AM, Steven Schveighoffer wrote:
But I'm not sure if the performance is going to be the same, since now 
it will likely FORCE autodecoding on the algorithms that have 
specialized versions to AVOID autodecoding (I think).


Autodecoding is expensive which is why the algorithms defeat it. Nearly 
none actually need it.


You can get decoding if needed by using .byDchar or .by!dchar (forgot 
which it was).


There is byCodePoint and byCodeUnit, whereas byCodePoint forces auto 
decoding.


The problem is, I want to use this wrapper just like it was a string in 
all respects (including the performance gains had by ignoring 
auto-decoding).


Not trying to give too much away about the library I'm writing, but the 
problem I'm trying to solve is parsing out tokens from a buffer. I want 
to delineate the whole, as well as the parts, but it's difficult to get 
back to the original buffer once you split and slice up the buffer using 
phobos functions.


Consider that you are searching for something in a buffer. Phobos 
provides all you need to narrow down your range to the thing you are 
looking for. But it doesn't give you a way to figure out where you are 
in the whole buffer.


Up till now, I've done it by weird length math, but it gets tiring (see 
for instance: 
https://github.com/schveiguy/fastaq/blob/master/source/fasta/fasta.d#L125). 
I just want to know where the darned thing I've narrowed down is in the 
original range!


So this wrapper I thought would be a way to use things like you always 
do, but at any point, you just extract a piece of information (a buffer 
reference) that shows where it is in the original buffer. It's quite 
easy to do that part, the problem is getting it to be a drop-in 
replacement for the original type.


Here's where I'm struggling -- because a string provides indexing, 
slicing, length, etc. but Phobos ignores that. I can't make a new type 
that does the same thing. Not only that, but I'm finding the 
specializations of algorithms only work on the type "string", and 
nothing else.


I'll try using byCodeUnit and see how it fares.

-Steve


Re: More fun with autodecoding

2018-08-08 Thread Walter Bright via Digitalmars-d

On 8/6/2018 6:57 AM, Steven Schveighoffer wrote:
But I'm not sure if the performance is going to be the 
same, since now it will likely FORCE autodecoding on the algorithms that have 
specialized versions to AVOID autodecoding (I think).


Autodecoding is expensive which is why the algorithms defeat it. Nearly none 
actually need it.


You can get decoding if needed by using .byDchar or .by!dchar (forgot which it 
was).


Re: More fun with autodecoding

2018-08-08 Thread bauss via Digitalmars-d
On Monday, 6 August 2018 at 13:57:10 UTC, Steven Schveighoffer 
wrote:


I'm very tempted to start writing my own parsing utilities and 
avoid using Phobos algorithms...


-Steve


Oh yes; the good old autodecoding.


More fun with autodecoding

2018-08-06 Thread Steven Schveighoffer via Digitalmars-d
I wanted to share a story where I actually tried to add a new type with 
autodecoding and failed.


I want to create a wrapper type that forwards an underlying range type 
but adds one feature -- tracking in the original range where you were. 
This is in a new library I'm writing for parsing.


So my first idea was I will just forward all methods from a given range 
manually -- I need to override certain ones which affect the offset into 
the original range.


However, typically parsing is done from text.

I realized, strings are a range of dchar, but I need the length and 
other things forwarded so they can be drop-in replacements for strings 
(I treat strings wstrings as character buffers in iopipe). However, 
phobos will then assume length() as the number of dchar elements, and 
assume it has indexing, etc.! Here is a case where I can't repeat the 
mistakes of phobos of auto-decoding for my own type! I never thought I'd 
have that problem...


So I thought, maybe I'll just alias this the underlying range and only 
override the parts that are needed. I end up with a nice tiny 
definition, and things are looking pretty good:


static struct Result
{
private size_t pos;
B _buffer;
alias _buffer this;

// implement the slice operations
size_t[2] opSlice(size_t dim)(int start, int end) if (dim == 0)
in
{ assert(start >= 0 && end <= _buffer.length); }
do
{
return [start, end];
}

Result opIndex(size_t[2] dims)
{
return Result(pos + dims[0], _buffer[dims[0] .. dims[1]]);
}

void popFront()
{
import std.traits : isNarrowString;
static if(isNarrowString!B)
{
auto prevLen = _buffer.length;
_buffer.popFront;
pos += prevLen - _buffer.length;
}
else
{
_buffer.popFront;
++pos;
}
}

// the specialized buffer reference accessor.
@property auto bufRef()
{
return BufRef(pos, _buffer.length);
}
}

Note already the sucky part in popFront.

But then I got a surprise when I went to use it:

import std.algorithm : splitter;
auto buf = "hi there this is a sentence";
auto split1 = buf.bwin.splitter; // specialized split range
auto split2 = buf.splitter; // normal split range
while(!split1.empty)
{
assert(split1.front == split2.front);
assert(split1.front.bufRef.concrete(buf) == split2.front); // 
FAILS!

split1.popFront;
split2.popfront;
}

What happened? It turns out, the splitter looks for length and indexing 
*OR* that it is a narrow string. Splitter is trying to ignore the fact 
that Phobos forces autodecoding on char arrays to achieve performance. 
With this taken into account, I think my type does not pass any of the 
constraints for any of the overloads (not 100% sure on that), so it 
devolves to just using the alias this'd element directly, completely 
circumventing the point of my wrapper. The error I get is "no member 
`bufRef` for type `string`".


My next attempt will be to use byCodeUnit when I detect a narrow string, 
which hopefully will work OK. But I'm not sure if the performance is 
going to be the same, since now it will likely FORCE autodecoding on the 
algorithms that have specialized versions to AVOID autodecoding (I think).


I'm very tempted to start writing my own parsing utilities and avoid 
using Phobos algorithms...


-Steve