Re: Handling arbitrary char ranges

2016-04-20 Thread ag0aep6g via Digitalmars-d-learn

On 21.04.2016 04:35, Alex Parrill wrote:

On Wednesday, 20 April 2016 at 22:44:37 UTC, ag0aep6g wrote:

On 20.04.2016 23:59, Alex Parrill wrote:

[...]

That's not assigning the elements of a void[]; it's just changing what
the slice points to and adjusting the length, like doing `void* ptr =
someOtherPtr;`


True, but assigning elements is possible via slices as shown.

[...]

It only seems to work on arrays, not arbitrary ranges, sliceable or not.
Though see below.


Yes, assigning slices and more complex vector operations only works with 
dynamic arrays.


[...]

 auto range = chain("hello", " ", "world").map!(ch => cast(char)
ch);

[...]

 auto written = schema.encode(range.front, currentPos);

[...]

You're "converting" chars to UTF-8 here, right? That's a nop. char is
a UTF-8 code unit already.


It can be either chars, wchars, or dchars.


Your range specifically has element type char, though. Not wchar or 
dchar. And Matt Kline wants to work on char ranges (and maybe string), 
not on arbitrary ranges of char/wchar/dchar.


[...]

byChar would work. byWChar and byDChar might cause endian-ness issues.


Easily combined with the endianess functions from std.bitmanip:

void main()
{
import std.algorithm: equal;
import std.bitmanip: nativeToBigEndian, nativeToLittleEndian;
import std.utf: byWchar;

string utf8 = "foobär";
auto utf16le = utf8.byWchar.map!nativeToLittleEndian;
auto utf16be = utf8.byWchar.map!nativeToBigEndian;

assert(equal(utf16le,
[['f', 0], ['o', 0], ['o', 0], ['b', 0], [0xE4, 0], ['r', 0]]));

assert(equal(utf16be,
[[0, 'f'], [0, 'o'], [0, 'o'], [0, 'b'], [0, 0xE4], [0, 'r']]));
}



Re: Handling arbitrary char ranges

2016-04-20 Thread Alex Parrill via Digitalmars-d-learn

On Wednesday, 20 April 2016 at 22:44:37 UTC, ag0aep6g wrote:

On 20.04.2016 23:59, Alex Parrill wrote:

On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:

[...]


First, you can't assign anything to a void[], for the same 
reason you
can't dereference a void*. This includes the slice assignment 
that you
are trying to do in `buf[0..minLen] = 
remainingData[0..minLen];`.


Not true. You can assign any dynamic array to a void[].


That's not assigning the elements of a void[]; it's just changing 
what the slice points to and adjusting the length, like doing 
`void* ptr = someOtherPtr;`


Regarding vector notation, the spec doesn't seem to mention how 
it interacts with void[], but dmd accepts this no problem:


int[] i = [1, 2, 3];
auto v = new void[](3 * int.sizeof);
v[] = i[];



It only seems to work on arrays, not arbitrary ranges, sliceable 
or not. Though see below.



[...]
Second, don't use slicing on ranges (unless you need it). Not 
all ranges

support it...


As far as I see, the slicing code is guarded by `static if 
(isArray!T)`. Arrays support slicing.


[...]

Instead, use a loop (or maybe `put`) to fill the array.


That's what done in the `else` path, no?


Yes, I did not see the static if condition, my bad.


Third, don't treat text as bytes; encode your characters.

 auto schema = EncodingScheme.create("utf-8");
 auto range = chain("hello", " ", "world").map!(ch => 
cast(char) ch);


 auto buf = new ubyte[](100);
 auto currentPos = buf;
 while(!range.empty && schema.encodedLength(range.front) <=
currentPos.length) {
 auto written = schema.encode(range.front, currentPos);
 currentPos = currentPos[written..$];
 range.popFront();
 }
 buf = buf[0..buf.length - currentPos.length];


You're "converting" chars to UTF-8 here, right? That's a nop. 
char is a UTF-8 code unit already.


It can be either chars, wchars, or dchars.

(PS there ought to be a range in Phobos that encodes each 
character,

something like map maybe)


std.utf.byChar and friends:

https://dlang.org/phobos/std_utf.html#.byChar


byChar would work. byWChar and byDChar might cause endian-ness 
issues.


Re: Handling arbitrary char ranges

2016-04-20 Thread ag0aep6g via Digitalmars-d-learn

On 20.04.2016 23:59, Alex Parrill wrote:

On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:

[...]


First, you can't assign anything to a void[], for the same reason you
can't dereference a void*. This includes the slice assignment that you
are trying to do in `buf[0..minLen] = remainingData[0..minLen];`.


Not true. You can assign any dynamic array to a void[].

Regarding vector notation, the spec doesn't seem to mention how it 
interacts with void[], but dmd accepts this no problem:


int[] i = [1, 2, 3];
auto v = new void[](3 * int.sizeof);
v[] = i[];


[...]

Second, don't use slicing on ranges (unless you need it). Not all ranges
support it...


As far as I see, the slicing code is guarded by `static if (isArray!T)`. 
Arrays support slicing.


[...]

Instead, use a loop (or maybe `put`) to fill the array.


That's what done in the `else` path, no?


Third, don't treat text as bytes; encode your characters.

 auto schema = EncodingScheme.create("utf-8");
 auto range = chain("hello", " ", "world").map!(ch => cast(char) ch);

 auto buf = new ubyte[](100);
 auto currentPos = buf;
 while(!range.empty && schema.encodedLength(range.front) <=
currentPos.length) {
 auto written = schema.encode(range.front, currentPos);
 currentPos = currentPos[written..$];
 range.popFront();
 }
 buf = buf[0..buf.length - currentPos.length];


You're "converting" chars to UTF-8 here, right? That's a nop. char is a 
UTF-8 code unit already.



(PS there ought to be a range in Phobos that encodes each character,
something like map maybe)


std.utf.byChar and friends:

https://dlang.org/phobos/std_utf.html#.byChar


Re: Handling arbitrary char ranges

2016-04-20 Thread Alex Parrill via Digitalmars-d-learn

On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:

[...]


First, you can't assign anything to a void[], for the same reason 
you can't dereference a void*. This includes the slice assignment 
that you are trying to do in `buf[0..minLen] = 
remainingData[0..minLen];`.


Cast the buffer to a `ubyte[]` buffer first, then you can assign 
bytes to it.


auto bytebuf = cast(ubyte[]) buf;
bytebuf[0] = 123;

Second, don't use slicing on ranges (unless you need it). Not all 
ranges support it...


auto buf = [1,2,3];
auto rng = filter!(x => x != 1)(buf);
pragma(msg, hasSlicing!(typeof(rng))); // false

... and even ranges that support it don't support assigning to an 
array by slice:


auto buf = new int[](3);
	buf[] = only(1,2,3)[]; // cannot implicitly convert expression 
(only(1, 2, 3).opSlice()) of type OnlyResult!(int, 3u) to int[]


Instead, use a loop (or maybe `put`) to fill the array.

Third, don't treat text as bytes; encode your characters.

auto schema = EncodingScheme.create("utf-8");
	auto range = chain("hello", " ", "world").map!(ch => cast(char) 
ch);


auto buf = new ubyte[](100);
auto currentPos = buf;
	while(!range.empty && schema.encodedLength(range.front) <= 
currentPos.length) {

auto written = schema.encode(range.front, currentPos);
currentPos = currentPos[written..$];
range.popFront();
}
buf = buf[0..buf.length - currentPos.length];

(PS there ought to be a range in Phobos that encodes each 
character, something like map maybe)


Re: Handling arbitrary char ranges

2016-04-20 Thread ag0aep6g via Digitalmars-d-learn

On 20.04.2016 22:09, Matt Kline wrote:

I'd rather not write my own cURL wrapper. Do you think it would be
worthwhile starting a PR for Phobos to get it changed to ubyte[]? A
reading of https://dlang.org/spec/arrays.html indicates the main
difference is that that GC crawls void[], but I would think that
wouldn't matter for a short-lived buffer being shoveled into libcurl,
which is, by nature, a copy of the same data somewhere else in your
program...


I don't know if a PR would be worthwhile. What you say makes sense to 
me, but I am by no means an expert here.


As you say, void[] is the safer default with regards to the GC.

It's also simpler to get a void[] from an arbitrary array, as any array 
implicitly converts to void[] (given compatible qualifiers). Getting a 
void[] from an arbitrary range isn't that simple, but getting a ubyte[] 
from an int[] requires some work, too.


void[] is possibly be the better option all around.


Re: Handling arbitrary char ranges

2016-04-20 Thread Matt Kline via Digitalmars-d-learn

On Wednesday, 20 April 2016 at 20:00:58 UTC, ag0aep6g wrote:

Maybe I've missed it, but you didn't say where the HTTP type 
comes from, did you?


std.net.curl: https://dlang.org/phobos/std_net_curl.html#.HTTP
(Sorry, I assumed that was a given since it's a standard library 
type. Poor assumption, perhaps.)


I'd rather not write my own cURL wrapper. Do you think it would 
be worthwhile starting a PR for Phobos to get it changed to 
ubyte[]? A reading of https://dlang.org/spec/arrays.html 
indicates the main difference is that that GC crawls void[], but 
I would think that wouldn't matter for a short-lived buffer being 
shoveled into libcurl, which is, by nature, a copy of the same 
data somewhere else in your program...


Re: Handling arbitrary char ranges

2016-04-20 Thread ag0aep6g via Digitalmars-d-learn

On 20.04.2016 21:48, Matt Kline wrote:

I don't have an option here, do I? I assume HTTP.onSend doesn't take a
`delegate size_t(ubyte[])` insetad of a `delegate size_t(void[])`, and
that the former isn't implicitly convertible to the latter.


Maybe I've missed it, but you didn't say where the HTTP type comes from, 
did you? If it's not under your control, then yeah, I guess you have to 
deal with void[].


[...]

Is this due solely to the "auto-decode" behavior? Generally, (except
apparently in this case) don't arrays of type T qualify as InputRanges
of type T?


Yep. Generally, T[] is a range with element type T. char[], wchar[], and 
their qualified variants are the exception. And the reason is 
auto-decoding to dchar, yes.


Re: Handling arbitrary char ranges

2016-04-20 Thread Matt Kline via Digitalmars-d-learn

On Wednesday, 20 April 2016 at 19:29:22 UTC, ag0aep6g wrote:

Maybe use ubyte[] for the buffer type instead.


I don't have an option here, do I? I assume HTTP.onSend doesn't 
take a `delegate size_t(ubyte[])` insetad of a `delegate 
size_t(void[])`, and that the former isn't implicitly convertible 
to the latter.


I think your problems come more from wanting to accept string, 
which simply isn't a char range


Is this due solely to the "auto-decode" behavior? Generally, 
(except apparently in this case) don't arrays of type T qualify 
as InputRanges of type T?




Re: Handling arbitrary char ranges

2016-04-20 Thread ag0aep6g via Digitalmars-d-learn

On 20.04.2016 19:09, Matt Kline wrote:

1. What is the idiomatic way to constrain the function to only take char
ranges? One might naïvely add `is(ElementType!T : char)`, but that falls
on its face due to strings "auto-decoding" their elements to dchar.
(More on that later.)


Well, string is not a char range. If you want to accept string, you have 
to special case it. Rejecting string is an option, though. The caller 
would then have to make a char range from the string. There's 
std.utf.byCodeUnit for that.



2. The function fails to compile, issuing, "cannot implicitly convert
expression (sendData[0..minLen]) of type string to void[]" on this line.
I assume this has to do with the immutability of string elements.
Specifying a non-const array of const elements is as simple as
`const(void)[]`, but how does one do this here, with a template argument?


Looks like a compiler bug to me. It works when you do it in two steps:

string sendData = "foo";
void[] buf = new void[3];
immutable(void)[] voidSendData = sendData;
buf[] = voidSendData[];


I've filed an issue:
https://issues.dlang.org/show_bug.cgi?id=15942


3. Is this needed, or is auto-decoding behavior specific to char arrays
and not other char ranges?


Auto-decoding is specific to arrays.


4. Is this a sane approach to make sure I'm dealing with ranges of
chars? Do I need to use `Unqual` to deal with const or immutable elements?


is(Foo : char) also accepts byte, ubyte, bool, and user-defined types 
with an alias this to a char.


You don't need Unqual with `: char`. Since immutable(char) is a value 
type still, it implicitly converts to char. However, if you want to 
reject those other types, and only accept char and its qualified 
variants, then you need Unqual:


is(Unqual!(ElementType!bcu) == char)



5. This fails, claiming the right hand side can't be converted to type
void. Casting to ubyte doesn't help - so how *does* one write to an
element of a void array?


void[] is a bit of a special case. A single value of type void isn't 
really a thing. You can't write `void v = 1;`. Maybe use ubyte[] for the 
buffer type instead.


To do it with void[], I guess you'd have to slice things:

char c = 'x';
void[] buf = new void[1];
buf[0 .. 1] = ()[0 .. 1];



Am I making things harder than they have to be? Or is dealing with an
arbitrary ranges of chars this complex? I've lost count of times
templated code wouldn't compile because dchar was sneaking in
somewhere... at least I'm in good company.
(http://forum.dlang.org/post/m01r3d$1frl$1...@digitalmars.com)


I think your problems come more from wanting to accept string, which 
simply isn't a char range, and from using void[] as the buffer type.


Re: Handling arbitrary char ranges

2016-04-20 Thread Alex Parrill via Digitalmars-d-learn

On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:
I'm doing some work with a REST API, and I wrote a simple 
utility function that sets an HTTP's onSend callback to send a 
string:


[...]


IO functions usually work with octets, not characters, so an 
extra encoding step is needed. For encoding character arrays to 
UTF-#, there's std.string.representation, and std.encoding might 
have something for arbitrary ranges.


Avoid slicing ranges; not all ranges support it. If you 
absolutely need it (you don't here) then add hasSlicing to the 
constraint.


isSomeChar can tell you if a type (like the ranges element type) 
is a character.