Re: What type does byGrapheme() return?

2020-01-06 Thread Alex via Digitalmars-d-learn

On Monday, 6 January 2020 at 08:39:19 UTC, Robert M. Münch wrote:

On 2020-01-05 04:18:34 +, H. S. Teoh said:

At a minimum, I think we should file a bug report to 
investigate whether
Grapheme.opSlice can be implemented differently, such that we 
avoid this
obscure referential behaviour that makes it hard to work with 
in complex
code. I'm not sure if this is possible, but IMO we should at 
least

investigate the possibilities.


Done... my first bug report :-) I copied togehter all the 
findings from this thread.


For the sake of completeness
https://issues.dlang.org/show_bug.cgi?id=20483


Re: What type does byGrapheme() return?

2020-01-06 Thread Robert M. Münch via Digitalmars-d-learn

On 2020-01-05 04:18:34 +, H. S. Teoh said:


At a minimum, I think we should file a bug report to investigate whether
Grapheme.opSlice can be implemented differently, such that we avoid this
obscure referential behaviour that makes it hard to work with in complex
code. I'm not sure if this is possible, but IMO we should at least
investigate the possibilities.


Done... my first bug report :-) I copied togehter all the findings from 
this thread.


--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



Re: What type does byGrapheme() return?

2020-01-04 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Jan 04, 2020 at 08:19:14PM +0100, Robert M. Münch via 
Digitalmars-d-learn wrote:
> On 2019-12-31 21:36:56 +, Steven Schveighoffer said:
> 
> > The fact that a Grapheme's return requires you keep the grapheme in
> > scope for operations seems completely incorrect and dangerous IMO
> > (note that operators are going to always have a ref this, even when
> > called on an rvalue). So even though using ref works, I think the
> > underlying issue here really is the lifetime problem.
> 
> Thanks for all the answers, pretty enlighting even I'm not sure I get
> everything 100%.
> 
> So, what to do for now? File a bug-report? What needs to be fixed?

At a minimum, I think we should file a bug report to investigate whether
Grapheme.opSlice can be implemented differently, such that we avoid this
obscure referential behaviour that makes it hard to work with in complex
code. I'm not sure if this is possible, but IMO we should at least
investigate the possibilities.


> I'm using the ref approach for now, in hope it will be OK for my
> use-case, which is converting a wstring to a grapheme[], alter the
> array, and map it back to a wstring. Sounds like a lot of porcessing
> for handling unicode text, but I don't think it gets a lot simpler
> than that.
[...]

Unicode is a beast. Be glad that you can even do this in the first
place.  If I were writing this in C, I wouldn't even know where to
begin!


T

-- 
No! I'm not in denial!


Re: What type does byGrapheme() return?

2020-01-04 Thread Robert M. Münch via Digitalmars-d-learn

On 2019-12-31 21:36:56 +, Steven Schveighoffer said:

The fact that a Grapheme's return requires you keep the grapheme in 
scope for operations seems completely incorrect and dangerous IMO (note 
that operators are going to always have a ref this, even when called on 
an rvalue). So even though using ref works, I think the underlying 
issue here really is the lifetime problem.


Thanks for all the answers, pretty enlighting even I'm not sure I get 
everything 100%.


So, what to do for now? File a bug-report? What needs to be fixed?

I'm using the ref approach for now, in hope it will be OK for my 
use-case, which is converting a wstring to a grapheme[], alter the 
array, and map it back to a wstring. Sounds like a lot of porcessing 
for handling unicode text, but I don't think it gets a lot simpler than 
that.


--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



Re: What type does byGrapheme() return?

2019-12-31 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Dec 31, 2019 at 04:36:56PM -0500, Steven Schveighoffer via 
Digitalmars-d-learn wrote:
> On 12/31/19 4:22 PM, H. S. Teoh wrote:
[...]
> > import std;
> > void main() {
> > auto x = "Bla\u0301hbla\u0310h\u0309!";
> > auto r = x.byGrapheme;
> > writefln("%s", r.map!((ref g) => g[]).joiner.to!string);
> > }
[...]
> > What did I do wrong?
> 
> auto r = x.byGrapheme.array;

Haha, in my hurry I totally forgot about the .array. Mea culpa.


[...]
> The fact that a Grapheme's return requires you keep the grapheme in
> scope for operations seems completely incorrect and dangerous IMO
> (note that operators are going to always have a ref this, even when
> called on an rvalue). So even though using ref works, I think the
> underlying issue here really is the lifetime problem.
[...]

After my wrong recollection of the history surrounding indexOf vs.
countUntil, I'm not sure I can rely on my memory anymore, :-P but AIUI
Dmitri implemented it this way because he wanted to avoid allocations
(GC or otherwise) in the most common case of Grapheme containing just a
small number of code points (usually 1 or 2). When the number of
combining diacritics exceed the size of the Grapheme struct, then it
would quietly switch to malloc or some such for holding the data.  My
guess is that this is the reason for passing  to the wrapper range
returned by opSlice(). And possibly it's also to allow mutation of the
Grapheme via the returned slice?

Perhaps this whole approach should be looked at again. Certainly, unless
I'm missing something, it *ought* to be possible to implement Grapheme
in a way that doesn't require this scoped reference business.


T

-- 
The diminished 7th chord is the most flexible and fear-instilling chord. Use it 
often, use it unsparingly, to subdue your listeners into submission!


Re: What type does byGrapheme() return?

2019-12-31 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/31/19 4:22 PM, H. S. Teoh wrote:

On Tue, Dec 31, 2019 at 04:02:47PM -0500, Steven Schveighoffer via 
Digitalmars-d-learn wrote:

On 12/31/19 2:58 PM, H. S. Teoh wrote:

On Tue, Dec 31, 2019 at 09:33:14AM -0500, Steven Schveighoffer via 
Digitalmars-d-learn wrote:

e.g.:

writeln(" Text = ", gr1.map!((ref g) => g[]).joiner.to!string);

[...]

Unfortunately this doesn't work. Somehow the ref parameter doesn't
match whatever it is std.algorithm.map is trying to pass to it.


Huh, it seemed to work for me. Got the full "Robert" with an R. map
does support ref-ness. Maybe you didn't put ref in the right place?


Here's my full non-working code:

import std;
void main() {
auto x = "Bla\u0301hbla\u0310h\u0309!";
auto r = x.byGrapheme;
writefln("%s", r.map!((ref g) => g[]).joiner.to!string);
}

The compiler says:

/usr/src/d/phobos/std/algorithm/iteration.d(604): Error: template D 
main.__lambda1 cannot deduce function from argument types !()(Grapheme), 
candidates are:
test.d(5):__lambda1
/usr/src/d/phobos/std/algorithm/iteration.d(499): Error: template 
instance test.main.MapResult!(__lambda1, Result!string) error instantiating
test.d(5):instantiated from here: map!(Result!string)

What did I do wrong?


auto r = x.byGrapheme.array;

This is how Robert originally had it if you look a few messages up.

Otherwise, it's not an lvalue.

The fact that a Grapheme's return requires you keep the grapheme in 
scope for operations seems completely incorrect and dangerous IMO (note 
that operators are going to always have a ref this, even when called on 
an rvalue). So even though using ref works, I think the underlying issue 
here really is the lifetime problem.


-Steve


Re: What type does byGrapheme() return?

2019-12-31 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Dec 31, 2019 at 04:02:47PM -0500, Steven Schveighoffer via 
Digitalmars-d-learn wrote:
> On 12/31/19 2:58 PM, H. S. Teoh wrote:
> > On Tue, Dec 31, 2019 at 09:33:14AM -0500, Steven Schveighoffer via 
> > Digitalmars-d-learn wrote:
> > > e.g.:
> > > 
> > > writeln(" Text = ", gr1.map!((ref g) => g[]).joiner.to!string);
> > [...]
> > 
> > Unfortunately this doesn't work. Somehow the ref parameter doesn't
> > match whatever it is std.algorithm.map is trying to pass to it.
> 
> Huh, it seemed to work for me. Got the full "Robert" with an R. map
> does support ref-ness. Maybe you didn't put ref in the right place?

Here's my full non-working code:

import std;
void main() {
auto x = "Bla\u0301hbla\u0310h\u0309!";
auto r = x.byGrapheme;
writefln("%s", r.map!((ref g) => g[]).joiner.to!string);
}

The compiler says:

/usr/src/d/phobos/std/algorithm/iteration.d(604): Error: template D 
main.__lambda1 cannot deduce function from argument types !()(Grapheme), 
candidates are:
test.d(5):__lambda1
/usr/src/d/phobos/std/algorithm/iteration.d(499): Error: template 
instance test.main.MapResult!(__lambda1, Result!string) error instantiating
test.d(5):instantiated from here: map!(Result!string)

What did I do wrong?


T

-- 
What do you get if you drop a piano down a mineshaft? A flat minor.


Re: What type does byGrapheme() return?

2019-12-31 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/31/19 2:58 PM, H. S. Teoh wrote:

On Tue, Dec 31, 2019 at 09:33:14AM -0500, Steven Schveighoffer via 
Digitalmars-d-learn wrote:

e.g.:

writeln(" Text = ", gr1.map!((ref g) => g[]).joiner.to!string);

[...]

Unfortunately this doesn't work. Somehow the ref parameter doesn't match
whatever it is std.algorithm.map is trying to pass to it.


Huh, it seemed to work for me. Got the full "Robert" with an R. map does 
support ref-ness. Maybe you didn't put ref in the right place?


-Steve


Re: What type does byGrapheme() return?

2019-12-31 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Dec 31, 2019 at 09:33:14AM -0500, Steven Schveighoffer via 
Digitalmars-d-learn wrote:
> On 12/30/19 6:31 PM, H. S. Teoh wrote:
> > On Mon, Dec 30, 2019 at 03:09:58PM -0800, H. S. Teoh via 
> > Digitalmars-d-learn wrote:
[...]
> > Haha, it's actually right there in the Grapheme docs for the opSlice
> > overloads:
> > 
> >  Random-access range over Grapheme's $(CHARACTERS).
> > 
> >  Warning: Invalidates when this Grapheme leaves the scope,
> >  attempts to use it then would lead to memory corruption.
> > 
> > Looks like when you use .map over the Grapheme, it gets copied into
> > a temporary, which gets invalidated when map.front returns.
> > Somewhere we're missing a 'scope' qualifier...
[...]
> Then the original example should be fixable by putting "ref" in for
> all the lambdas.
> 
> But this is kind of disturbing. Why does the grapheme do this? The
> original data is not scoped.

Honestly I have no idea, but glancing at the code in std.uni reveals
that the returned slice is actually a wrapper object that contains a
pointer to the parent Grapheme object.  So if the parent was a
temporary and goes out of scope before the wrapper does, we're left with
a dangling pointer.


> e.g.:
> 
> writeln(" Text = ", gr1.map!((ref g) => g[]).joiner.to!string);
[...]

Unfortunately this doesn't work. Somehow the ref parameter doesn't match
whatever it is std.algorithm.map is trying to pass to it.


T

-- 
They say that "guns don't kill people, people kill people." Well I think the 
gun helps. If you just stood there and yelled BANG, I don't think you'd kill 
too many people. -- Eddie Izzard, Dressed to Kill


Re: What type does byGrapheme() return?

2019-12-31 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/30/19 6:31 PM, H. S. Teoh wrote:

On Mon, Dec 30, 2019 at 03:09:58PM -0800, H. S. Teoh via Digitalmars-d-learn 
wrote:
[...]

I suspect the cause is that whatever Grapheme.opSlice returns is going
out-of-scope when used with .map, that's why it's malfunctioning.

[...]

Haha, it's actually right there in the Grapheme docs for the opSlice
overloads:

 Random-access range over Grapheme's $(CHARACTERS).

 Warning: Invalidates when this Grapheme leaves the scope,
 attempts to use it then would lead to memory corruption.

Looks like when you use .map over the Grapheme, it gets copied into a
temporary, which gets invalidated when map.front returns.  Somewhere
we're missing a 'scope' qualifier...



Then the original example should be fixable by putting "ref" in for all 
the lambdas.


But this is kind of disturbing. Why does the grapheme do this? The 
original data is not scoped.


e.g.:

writeln(" Text = ", gr1.map!((ref g) => g[]).joiner.to!string);

-Steve




Re: What type does byGrapheme() return?

2019-12-30 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Dec 30, 2019 at 03:31:31PM -0800, H. S. Teoh via Digitalmars-d-learn 
wrote:
> On Mon, Dec 30, 2019 at 03:09:58PM -0800, H. S. Teoh via Digitalmars-d-learn 
> wrote:
> [...]
> > I suspect the cause is that whatever Grapheme.opSlice returns is
> > going out-of-scope when used with .map, that's why it's
> > malfunctioning.
> [...]
> 
> Haha, it's actually right there in the Grapheme docs for the opSlice
> overloads:
> 
> Random-access range over Grapheme's $(CHARACTERS).
> 
> Warning: Invalidates when this Grapheme leaves the scope,
> attempts to use it then would lead to memory corruption.
> 
> Looks like when you use .map over the Grapheme, it gets copied into a
> temporary, which gets invalidated when map.front returns.  Somewhere
> we're missing a 'scope' qualifier...
[...]

Indeed, compiling with dmd -dip1000 produces this error message:

test.d(15): Error: returning g.opSlice() escapes a reference to 
parameter g, perhaps annotate with return
/usr/src/d/phobos/std/algorithm/iteration.d(499):instantiated 
from here: MapResult!(__lambda1, Grapheme[])
test.d(15):instantiated from here: map!(Grapheme[])

Not the most helpful message (the annotation has to go in Phobos code,
not in user code), but it does at least point to the cause of the
problem.


T

-- 
What doesn't kill me makes me stranger.


Re: What type does byGrapheme() return?

2019-12-30 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Dec 30, 2019 at 03:09:58PM -0800, H. S. Teoh via Digitalmars-d-learn 
wrote:
[...]
> I suspect the cause is that whatever Grapheme.opSlice returns is going
> out-of-scope when used with .map, that's why it's malfunctioning.
[...]

Haha, it's actually right there in the Grapheme docs for the opSlice
overloads:

Random-access range over Grapheme's $(CHARACTERS).

Warning: Invalidates when this Grapheme leaves the scope,
attempts to use it then would lead to memory corruption.

Looks like when you use .map over the Grapheme, it gets copied into a
temporary, which gets invalidated when map.front returns.  Somewhere
we're missing a 'scope' qualifier...


T

-- 
War doesn't prove who's right, just who's left. -- BSD Games' Fortune


Re: What type does byGrapheme() return?

2019-12-30 Thread H. S. Teoh via Digitalmars-d-learn
On Sun, Dec 29, 2019 at 01:19:09PM +0100, Robert M. Münch via 
Digitalmars-d-learn wrote:
> On 2019-12-27 19:44:59 +, H. S. Teoh said:
[...]
> > If you want to add/delete/change graphemes, what you *really* want
> > is to use an array of Graphemes:
> > 
> > Grapheme[] editableGraphs;
> > 
> > You can then splice it, insert stuff, delete stuff, whatever.
> > 
> > When you're done with it, convert it back to string with something
> > like this:
> > 
> > string result = editableGraphs.map!(g => g[]).joiner.to!string;
> 
> I played around with this approach...
> 
> string r1 = "Robert M. Münch";
>   // Code-Units  = 16
>   // Code-Points = 15
>   // Graphemes   = 15
> 
> Grapheme[] gr1 = r1.byGrapheme.array;
> writeln(" Text = ", gr1.map!(g => g[]).joiner.to!string);
>   //  Text = obert M. Münch
> writeln("wText = ", gr1.map!(g => g[]).joiner.to!wstring);
>   //  wText = obert M. Münch
> writeln("dText = ", gr1.map!(g => g[]).joiner.to!dstring);
>   //  dText = obert M. Münch
> 
> Why is the first letter missing? Is this a bug?
[...]

I suspect there's a scope-related bug/issue somewhere here.  I did some
experiments and discovered that using foreach to iterate over a
Grapheme[] is OK, but somehow when using Grapheme[] with .map to slice
over each one, I get random UTF-8 encoding errors and missing
characters.

I suspect the cause is that whatever Grapheme.opSlice returns is going
out-of-scope when used with .map, that's why it's malfunctioning. The
last time I looked at the Grapheme code, there's a bunch of
memory-related stuff involving dtors that's *probably* the cause of this
problem.

Please file a bug for this.


T

-- 
Life is complex. It consists of real and imaginary parts. -- YHL


Re: What type does byGrapheme() return?

2019-12-30 Thread Alexandru Ermicioi via Digitalmars-d-learn
On Friday, 27 December 2019 at 17:26:58 UTC, Robert M. Münch 
wrote:

...


There are set of range interfaces that can be used to mask range 
type. Check for 
https://dlang.org/library/std/range/interfaces/input_range.html 
for starting point, and for 
https://dlang.org/library/std/range/interfaces/input_range_object.html for wrapping any range to those interfaces.


Note: resulting wrapped range is an object and has reference 
semantics, beware of using it directly with other range 
algorithms as they can consume your range.


Best regards,
Alexandru.


Re: What type does byGrapheme() return?

2019-12-29 Thread Robert M. Münch via Digitalmars-d-learn

On 2019-12-27 19:44:59 +, H. S. Teoh said:


Since graphemes are variable-length in terms of code points, you can't
exactly *edit* a range of graphemes -- you can't replace a 1-codepoint
grapheme with a 6-codepoint grapheme, for example, since there's no
space in the underlying string to store that.


Hi, my idea was that when I use a grapheme range, it will abstract away 
that graphemes consist of different sized code-points. And the docs at 
https://dlang.org/phobos/std_uni.html#byGrapheme show an example using 
this kind of range:


auto gText = text.byGrapheme;

gText.take(3);
gText.drop(3);

But maybe I need to get a better understanding of the ranges stuff too...


If you want to add/delete/change graphemes, what you *really* want is to
use an array of Graphemes:

Grapheme[] editableGraphs;

You can then splice it, insert stuff, delete stuff, whatever.

When you're done with it, convert it back to string with something like
this:

string result = editableGraphs.map!(g => g[]).joiner.to!string;


I played around with this approach...

string r1 = "Robert M. Münch";
// Code-Units  = 16
// Code-Points = 15
// Graphemes   = 15

Grapheme[] gr1 = r1.byGrapheme.array;
writeln(" Text = ", gr1.map!(g => g[]).joiner.to!string);
//  Text = obert M. Münch
writeln("wText = ", gr1.map!(g => g[]).joiner.to!wstring);
//  wText = obert M. Münch
writeln("dText = ", gr1.map!(g => g[]).joiner.to!dstring);
//  dText = obert M. Münch

Why is the first letter missing? Is this a bug?

--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



Re: What type does byGrapheme() return?

2019-12-29 Thread Robert M. Münch via Digitalmars-d-learn

On 2019-12-27 17:54:28 +, Steven Schveighoffer said:

This is the rub with ranges. You need to use typeof. There's no other 
way to do it, because the type returned by byGrapheme depends on the 
type of Range.


Hi, ok, thanks a lot and IMO these are the fundamental important 
information for people using D (beginners, causual programmers, etc.) 
to understand how things fit together.



If you know what type Range is, it would be:

struct S
{
typeof(string.init.byGrapheme()) gText;
// or
alias GRange = typeof(string.init.byGrapheme());
GRange gText;
}


Ah... I didn't know that I can use a basic type "string" combined with 
".init" to manually build the type. Neat...


Subbing in whatever your real range for `string`. Or if it's the result 
of a bunch of adapters, use the whole call chain with typeof.


Ok, and these are good candidates for alias definitions to avoid 
re-typing it many times.


Why not just declare the real range type? Because it's going to be 
ugly, especially if your underlying range is the result of other range 
algorithms. And really, typeof is going to be the better mechanism, 
even if it's not the best looking thing.


I think I got it... thanks a lot.

--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



Re: What type does byGrapheme() return?

2019-12-27 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Dec 27, 2019 at 06:26:58PM +0100, Robert M. Münch via 
Digitalmars-d-learn wrote:
> I love these documentation lines in the D docs:
> 
>   auto byGrapheme(Range)(Range range)
> 
> How should I know what auto is? Why not write the explicit type so
> that I know what to expect? When declaring a variable as class/struct
> member I can't use auto but need the explicit type...
> 
> I used typeof() but that doesn't help a lot:
> 
> gText = [Grapheme(53, 0, 0, 72057594037927936, [83, , 1)]Result!string
> 
> I want to iterate a string byGrapheme so that I can add, delete,
> change graphemes.
[...]

Since graphemes are variable-length in terms of code points, you can't
exactly *edit* a range of graphemes -- you can't replace a 1-codepoint
grapheme with a 6-codepoint grapheme, for example, since there's no
space in the underlying string to store that.

If you want to add/delete/change graphemes, what you *really* want is to
use an array of Graphemes:

Grapheme[] editableGraphs;

You can then splice it, insert stuff, delete stuff, whatever.

When you're done with it, convert it back to string with something like
this:

string result = editableGraphs.map!(g => g[]).joiner.to!string;


T

-- 
The irony is that Bill Gates claims to be making a stable operating system and 
Linus Torvalds claims to be trying to take over the world. -- Anonymous


Re: What type does byGrapheme() return?

2019-12-27 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/27/19 12:26 PM, Robert M. Münch wrote:

I love these documentation lines in the D docs:

 auto byGrapheme(Range)(Range range)

How should I know what auto is? Why not write the explicit type so that 
I know what to expect? When declaring a variable as class/struct member 
I can't use auto but need the explicit type...


I used typeof() but that doesn't help a lot:

gText = [Grapheme(53, 0, 0, 72057594037927936, [83, , 1)]Result!string

I want to iterate a string byGrapheme so that I can add, delete, change 
graphemes.




This is the rub with ranges. You need to use typeof. There's no other 
way to do it, because the type returned by byGrapheme depends on the 
type of Range.


If you know what type Range is, it would be:

struct S
{
   typeof(string.init.byGrapheme()) gText;
   // or
   alias GRange = typeof(string.init.byGrapheme());
   GRange gText;
}

Subbing in whatever your real range for `string`. Or if it's the result 
of a bunch of adapters, use the whole call chain with typeof.


Why not just declare the real range type? Because it's going to be ugly, 
especially if your underlying range is the result of other range 
algorithms. And really, typeof is going to be the better mechanism, even 
if it's not the best looking thing.


-Steve