Re: Odd behaviour of std.range

2022-02-22 Thread Ali Çehreli via Digitalmars-d-learn

On 2/22/22 09:25, frame wrote:

> Well, I think it's ok for strings but it shouldn't do it for simple
> arrays

string is a simple array as well just with immutable(char) as elements. 
It is just an alias:


  alias string = immutable(char)[];

> where it's intentional that I want to process the character and
> not a UTF-8 codepoint.

I understand how auto decoding can be bad but I doubt you need to 
process a char. char is a UTF-8 code unit, likely one of multiple bytes 
that represent a Unicode character; an information encoding byte, not 
the information. That code unit includes encoding bits that tell the 
decoder whether it is the first character or a continuation character. 
Not many programmer will ever need to write code to decode UTF-8.


Ali



Re: Odd behaviour of std.range

2022-02-22 Thread frame via Digitalmars-d-learn

On Tuesday, 22 February 2022 at 17:33:18 UTC, H. S. Teoh wrote:
On Tue, Feb 22, 2022 at 05:25:18PM +, frame via 
Digitalmars-d-learn wrote:

On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote:

> Welcome to the world of auto decoding, D's million dollar 
> mistake.


Well, I think it's ok for strings but it shouldn't do it for 
simple arrays

[...]

In D, a string *is* an array. `string` is just an alias for
`immutable(char)[]`.


I know, but it's also a type that says "this data belongs 
together, characters will not change, it's finalized" and it 
makes sense that it can contain combined bytes for a code point. 
`char[]` is just an array to work with. It should be seen as a 
collection of single characters. If you want auto decoding, use a 
string instead.


Re: Odd behaviour of std.range

2022-02-22 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Feb 22, 2022 at 05:25:18PM +, frame via Digitalmars-d-learn wrote:
> On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote:
> 
> > Welcome to the world of auto decoding, D's million dollar mistake.
> 
> Well, I think it's ok for strings but it shouldn't do it for simple
> arrays
[...]

In D, a string *is* an array. `string` is just an alias for
`immutable(char)[]`.


T

-- 
Gone Chopin. Bach in a minuet.


Re: Odd behaviour of std.range

2022-02-22 Thread frame via Digitalmars-d-learn

On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote:

Welcome to the world of auto decoding, D's million dollar 
mistake.


Well, I think it's ok for strings but it shouldn't do it for 
simple arrays where it's intentional that I want to process the 
character and not a UTF-8 codepoint.


Thank you all.


Re: Odd behaviour of std.range

2022-02-22 Thread frame via Digitalmars-d-learn

On Tuesday, 22 February 2022 at 12:53:03 UTC, Adam D Ruppe wrote:

On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:

What am I missing here? Is this some UTF conversion issue?


`front` is a phobos function. Phobos treats char as special 
than all other arrays.


Ah, ok. It directly attaches `front` to the string, regardless of 
the function. That is the problem.



It was a naive design flaw that nobody has the courage to fix.


... or ask why you're doing range operations on a string in the 
first place and see if the behavior actually kinda makes sense 
for you.


Because I needed a similar function to `tail` that takes care of 
the length and even it's trivial to implement it by myself, I 
just thought it's better to use a function that is already there.





Re: Odd behaviour of std.range

2022-02-22 Thread bauss via Digitalmars-d-learn

On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:

What am I missing here? Is this some UTF conversion issue?

```d
string a;
char[] b;

pragma(msg, typeof(a.take(1).front)); // dchar
pragma(msg, typeof(b.take(1).front)); // dchar
```


Welcome to the world of auto decoding, D's million dollar mistake.


Re: Odd behaviour of std.range

2022-02-22 Thread Paul Backus via Digitalmars-d-learn

On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:

What am I missing here? Is this some UTF conversion issue?

```d
string a;
char[] b;

pragma(msg, typeof(a.take(1).front)); // dchar
pragma(msg, typeof(b.take(1).front)); // dchar
```


This is a feature of the D standard library known as "auto 
decoding":


as a convenience, when iterating over a string using the range 
functions, each element of strings and wstrings is converted 
into a UTF-32 code-point as each item. This practice, known as 
auto decoding, means that


`static assert(is(typeof(utf8.front) == dchar));`


Source: https://tour.dlang.org/tour/en/gems/unicode


Re: Odd behaviour of std.range

2022-02-22 Thread Adam D Ruppe via Digitalmars-d-learn

On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:

What am I missing here? Is this some UTF conversion issue?


`front` is a phobos function. Phobos treats char as special than 
all other arrays.


It was a naive design flaw that nobody has the courage to fix.

Either just don't use phobos on strings (the language itself 
treats them sane, you can foreach etc), use the .representation 
member on them before putting it into any range, or ask why 
you're doing range operations on a string in the first place and 
see if the behavior actually kinda makes sense for you.


Odd behaviour of std.range

2022-02-22 Thread frame via Digitalmars-d-learn

What am I missing here? Is this some UTF conversion issue?

```d
string a;
char[] b;

pragma(msg, typeof(a.take(1).front)); // dchar
pragma(msg, typeof(b.take(1).front)); // dchar
```