Re: Auto-casting in range based functions?

2012-05-13 Thread Jonathan M Davis
On Sunday, May 13, 2012 19:49:00 Andrew Stanton wrote:
> I have been playing around with D as a scripting tool and have
> been running into the following issue:
> 
> ---
> import std.algorithm;
> 
> struct Delim {
>  char delim;
>  this(char d) {
>  delim = d;
>  }
> }
> 
> void main() {
>  char[] d = ['a', 'b', 'c'];
>  auto delims = map!Delim(d);
> }
> 
> /*
> Compiling gives me the following error:
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error:
> constructor test.Delim.this (char d) is not callable using
> argument types (dchar)
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot
> implicitly convert expression ('\U') of type dchar to char
> 
> */
> 
> ---
> 
> As someone who most of the time doesn't need to handle unicode,
> is there a way I can convince these functions to not upcast char
> to dchar?  I can't think of a way to make the code more explicit
> in its typing.

_All_ string types are considered ranges of dchar and treated as such. That 
means that narrow strings (e.g. arrays of char or wchar) are not random-access 
ranges and have no length property as far as range-based functions are 
concerned. So, you can _never_ have char[] treated as a range of char by any 
Phobos functions. char[] is UTF-8 by definition, and range-based functions in 
Phobos operates on code points, not code units.

If you want a char[] to be treated as a range of char, then you're going to 
have to use ubyte[] instead. e.g.

char[] d = ['a', 'b', 'c'];
auto delims = map!Delim(cast(ubyte[])d);

Now, personally, I would argue that you should just use dchar, not char, 
because regadless of what you are or aren't doing with unicode right now, the 
odds are that you'll end up processing unicode at some point, and if you're in 
the habit of using char, you're going to get all kinds of bugs. So, if you 
just did

struct Delim
{
dchar delim;

this(dchar d)
{
delim = d;
}
}

void main()
{
char[] d = ['a', 'b', 'c'];
auto delims = map!Delim(d);
}

then it should work just fine. And if you really need a char instead of dchar 
for some reason, you can always just use std.conv.to - to!char(value) - which 
will then throw if you're trying to convert a code point that won't fit in a 
char.

In general, any code which has a variable of char or wchar as a variable 
rather than an element in an array is a red flag which indicates a likely bug 
or bad design. In specific circumstances, you may need to do so, but in 
general, it's just asking for bugs. And you're going to have to be fighting 
Phobos all the time if you try and use ranges of code units rather than ranges 
of code points.

- Jonathan M Davis


Re: Auto-casting in range based functions?

2012-05-13 Thread Artur Skawina
On 05/13/12 19:49, Andrew Stanton wrote:
> I have been playing around with D as a scripting tool and have been running 
> into the following issue:
> 
> ---
> import std.algorithm;
> 
> struct Delim {
> char delim;
> this(char d) {
> delim = d;
> }
> }
> 
> void main() {
> char[] d = ['a', 'b', 'c'];
> auto delims = map!Delim(d);
> }
> 
> /*
> Compiling gives me the following error:
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: constructor 
> test.Delim.this (char d) is not callable using argument types (dchar)
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot implicitly 
> convert expression ('\U') of type dchar to char
> 
> */
> 
> ---
> 
> As someone who most of the time doesn't need to handle unicode, is there a 
> way I can convince these functions to not upcast char to dchar?  I can't 
> think of a way to make the code more explicit in its typing.

Well, if you don't want/need utf8 at all:

   alias ubyte ascii;

   int main() {
   ascii[] d = ['a', 'b', 'c'];
   auto delims = map!Delim(d);
   //...

and if you want to avoid utf8 just for this case (ie you "know" 'd[]'
contains just ascii) something like this should work:

char[] d = ['a', 'b', 'c'];
auto delims = map!((c){assert(c<128); return Delim(cast(char)c);})(d);

(it's probably more efficient when written as

auto delims = map!Delim(cast(ascii[])d);

but you loose the safety checks)

artur


Auto-casting in range based functions?

2012-05-13 Thread Andrew Stanton
I have been playing around with D as a scripting tool and have 
been running into the following issue:


---
import std.algorithm;

struct Delim {
char delim;
this(char d) {
delim = d;
}
}

void main() {
char[] d = ['a', 'b', 'c'];
auto delims = map!Delim(d);
}

/*
Compiling gives me the following error:
/usr/include/d/dmd/phobos/std/algorithm.d(382): Error: 
constructor test.Delim.this (char d) is not callable using 
argument types (dchar)
/usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot 
implicitly convert expression ('\U') of type dchar to char


*/

---

As someone who most of the time doesn't need to handle unicode, 
is there a way I can convince these functions to not upcast char 
to dchar?  I can't think of a way to make the code more explicit 
in its typing.