Re: byChunk odd behavior?

2016-03-26 Thread Hanh via Digitalmars-d-learn

On Saturday, 26 March 2016 at 08:34:04 UTC, cym13 wrote:

Sorry, it seems I completely misunderstood you goal. I thought 
that take() consumed its input (which mostly only shows that I 
really am careful about not reusing ranges). Writting a take 
that consume shouldn't be difficult though:


import std.range, std.traits;
Take!R takeConsume(R)(auto ref R input, size_t n)
if (isInputRange!(Unqual!R)
&& !isInfinite!(Unqual!R)
{
auto buffer = input.take(n);
input = input.drop(buffer.walkLength);
return buffer;
}

but I think going with std.bitmanip/read may be the easiest in 
the end.


Turns out bitmanip is actually using a loop.

foreach(ref e; bytes)
{
  e = range.front;
  range.popFront();
}

By the way, in your code above you are actually reusing the 
range: take is followed by drop and it won't work on an input 
range like 'byChunk'. That's the problem I ran into (see first 
post).




Re: byChunk odd behavior?

2016-03-26 Thread cym13 via Digitalmars-d-learn

On Saturday, 26 March 2016 at 02:28:53 UTC, Hanh wrote:

On Friday, 25 March 2016 at 08:01:04 UTC, cym13 wrote:

// This consume
auto buffer3 = range.take(4).array;
assert(buffer3 == [0, 5, 10, 15]);
}


Thanks for your help. However the last statement is incorrect. 
I am in fact looking for a version of 'take' that consumes the 
InputRange.


You can see it by doing a second take afterwards.

auto buffer3 = range.take(4).array;
assert(buffer3 == [0, 5, 10, 15]);
auto buffer4 = range.take(4).array;
assert(buffer4 == [0, 5, 10, 15]);

I haven't clearly explained my main goal. I have a large binary 
file that I need to deserialize. It's not my file and it's in a 
custom but simple format, so I would prefer not to depend on a 
third party serializer library but I will look into that.


I was thinking around the lines of:
1. Open file
2. Map a byChunk.joiner to read by chunks and present an 
iterator interface

3. Read data with std.bitmanip/read functions

Step 3. works fine as long as items are single scalar values. 
bitmanip doesn't have array readers. Obviously, I could loop 
but then I thought that for the case of a ubyte[], there would 
be a shortcut that I don't know about.


Thanks,
--h


Sorry, it seems I completely misunderstood you goal. I thought 
that take() consumed its input (which mostly only shows that I 
really am careful about not reusing ranges). Writting a take that 
consume shouldn't be difficult though:


import std.range, std.traits;
Take!R takeConsume(R)(auto ref R input, size_t n)
if (isInputRange!(Unqual!R)
&& !isInfinite!(Unqual!R)
{
auto buffer = input.take(n);
input = input.drop(buffer.walkLength);
return buffer;
}

but I think going with std.bitmanip/read may be the easiest in 
the end.




Re: byChunk odd behavior?

2016-03-25 Thread Hanh via Digitalmars-d-learn

On Friday, 25 March 2016 at 08:01:04 UTC, cym13 wrote:

// This consume
auto buffer3 = range.take(4).array;
assert(buffer3 == [0, 5, 10, 15]);
}


Thanks for your help. However the last statement is incorrect. I 
am in fact looking for a version of 'take' that consumes the 
InputRange.


You can see it by doing a second take afterwards.

auto buffer3 = range.take(4).array;
assert(buffer3 == [0, 5, 10, 15]);
auto buffer4 = range.take(4).array;
assert(buffer4 == [0, 5, 10, 15]);

I haven't clearly explained my main goal. I have a large binary 
file that I need to deserialize. It's not my file and it's in a 
custom but simple format, so I would prefer not to depend on a 
third party serializer library but I will look into that.


I was thinking around the lines of:
1. Open file
2. Map a byChunk.joiner to read by chunks and present an iterator 
interface

3. Read data with std.bitmanip/read functions

Step 3. works fine as long as items are single scalar values. 
bitmanip doesn't have array readers. Obviously, I could loop but 
then I thought that for the case of a ubyte[], there would be a 
shortcut that I don't know about.


Thanks,
--h



Re: byChunk odd behavior?

2016-03-25 Thread cym13 via Digitalmars-d-learn

On Thursday, 24 March 2016 at 07:52:27 UTC, Hanh wrote:

On Wednesday, 23 March 2016 at 19:07:34 UTC, cym13 wrote:

In Scala, 'take' consumes bytes from the iterator. So the 
same code would be

buffer = range.take(N).toArray


Then just do that!

import std.range, std.array;
auto buffer = range.take(N).array;

auto example = iota(0, 200, 5).take(5).array;
assert(example == [0, 5, 10, 15, 20]);


Well, that's what I do in the first post but you can't call it 
twice with an InputRange.


auto buffer1 = range.take(4).array; // ok
range.popFrontN(4); // not ok
auto buffer2 = range.take(4).array; // not ok


Please, take some time to reread cy's answer above.

void main(string[] args) {
import std.range;
import std.array;
import std.algorithm;

auto range = iota(0, 25, 5);

// Will not consume (forward ranges only)
//
// Note however that range elements are not stored in any 
way by default
// so reusing the range will also need you to recompute 
them each time!

auto buffer1 = range.save.take(4).array;
assert(buffer1 == [0, 5, 10, 15]);

// The solution to the recomputation problème, and often 
the best way to

// handle range reuse is to store them in an array
//
// This is reusable at will with no redundant computation
auto arr = range.save.array;
assert(arr == [0, 5, 10, 15, 20]);

// And it has a range interface too
auto buffer2 = arr.take(4).array;
assert(buffer2 == [0, 5, 10, 15]);

// This consume
auto buffer3 = range.take(4).array;
assert(buffer3 == [0, 5, 10, 15]);
}



Re: byChunk odd behavior?

2016-03-24 Thread Hanh via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 19:07:34 UTC, cym13 wrote:

In Scala, 'take' consumes bytes from the iterator. So the same 
code would be

buffer = range.take(N).toArray


Then just do that!

import std.range, std.array;
auto buffer = range.take(N).array;

auto example = iota(0, 200, 5).take(5).array;
assert(example == [0, 5, 10, 15, 20]);


Well, that's what I do in the first post but you can't call it 
twice with an InputRange.


auto buffer1 = range.take(4).array; // ok
range.popFrontN(4); // not ok
auto buffer2 = range.take(4).array; // not ok



Re: byChunk odd behavior?

2016-03-23 Thread cym13 via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 03:17:05 UTC, Hanh wrote:

Thanks for your help everyone.

I agree that the issue is due to the misusage of an InputRange 
but what is the semantics of 'take' when applied to an 
InputRange? It seems that calling it invalidates the range; in 
which case what is the recommended way to get a few bytes and 
keep on advancing.


Doing *anything* to a range invalidates it (or at least you 
should expect it to), a range is read-once. Never reuse a range. 
Some ranges can be saved in order to use a copy of it, but never 
expect a range to be implicitely reusable.



For instance, to read a ushort, I use
range.read!(ushort)()
Unfortunately, it reads a single value.

For now, I use a loop

foreach (element ; range.enumerate) {
  buffer[i] = range.front;
  range.popFront();
  }

Is there a more idiomatic way to do the same thing?


Two ways, the first one being for reference:

import std.range: enumerate;
foreach (element, index ; range.enumerate) {
buffer[index] = element;
}

And the other one

In Scala, 'take' consumes bytes from the iterator. So the same 
code would be

buffer = range.take(N).toArray


Then just do that!

import std.range, std.array;
auto buffer = range.take(N).array;

auto example = iota(0, 200, 5).take(5).array;
assert(example == [0, 5, 10, 15, 20]);



Re: byChunk odd behavior?

2016-03-23 Thread Chris Wright via Digitalmars-d-learn
On Wed, 23 Mar 2016 03:17:05 +, Hanh wrote:
> In Scala, 'take' consumes bytes from the iterator. So the same code
> would be buffer = range.take(N).toArray

import std.range, std.array;
auto bytes = byteRange.takeExactly(N).array;

There's also take(N), but if the range contains fewer than N elements, it 
will only give you as many as the range contains. If If you're trying to 
deserialize something, takeExactly is probably better.


http://dpldocs.info/experimental-docs/std.range.takeExactly.html
http://dpldocs.info/experimental-docs/std.array.array.1.html


Re: byChunk odd behavior?

2016-03-22 Thread Hanh via Digitalmars-d-learn

Thanks for your help everyone.

I agree that the issue is due to the misusage of an InputRange 
but what is the semantics of 'take' when applied to an 
InputRange? It seems that calling it invalidates the range; in 
which case what is the recommended way to get a few bytes and 
keep on advancing.


For instance, to read a ushort, I use
range.read!(ushort)()
Unfortunately, it reads a single value.

For now, I use a loop

foreach (i; 0..N) {
  buffer[i] = range.front;
  range.popFront();
  }

Is there a more idiomatic way to do the same thing?

In Scala, 'take' consumes bytes from the iterator. So the same 
code would be

buffer = range.take(N).toArray



Re: byChunk odd behavior?

2016-03-22 Thread cy via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 07:17:41 UTC, Hanh wrote:

input.take(3).array;
foreach (char c; input) {


Never use an input range twice. So, here's how to use it twice:

If it's a "forward range" you can use save() to get a copy to use 
later (but all the std.stdio.* ranges don't implement that). You 
can also use "std.range.tee" to send the results to an "output 
range" (something implementing put(K)(K)) while iterating over 
them.


tee can't produce two input ranges, because without caching all 
iterated items in memory, only one range can request items 
on-demand; the other must take them passively.


You could write a thing that takes an InputRange and produces a 
ForwardRange, by caching those items in memory, but at that point 
you might as well use .array and get the whole thing.


ByChunk is an input range (not a forward range), so there's 
undefined behavior when you use it twice. No bugs there, since it 
wasn't meant to be reused anyway. What it does is cache the last 
seen chunk, first iterate over that, then read more chunks from 
the file. So every time you iterate, you'll get that same last 
chunk.


It's also tricky to use input ranges after mutating their 
underlying data structure. If you seek in the file, for instance, 
then a previously created ByChunk will produce the chunk it has 
cached, and only then start reading chunks from that exact 
position in the file. A range over some sort of list, if you 
delete the current item in the list, should the range produce the 
previous item? The next item? null?


So, as a general rule, never use input ranges twice, and never 
use them after mutating the underlying data structure. Just 
recreate them if you want to do something twice, or use tee as 
mentioned above.


Re: byChunk odd behavior?

2016-03-22 Thread Ali Çehreli via Digitalmars-d-learn

On 03/22/2016 12:17 AM, Hanh wrote:
> Hi all,
>
> I'm trying to process a rather large file as an InputRange and run into
> something strange with byChunk / take.
>
> void test() {
>  auto file = new File("test.txt");
>  auto input = file.byChunk(2).joiner;
>  input.take(3).array;
>  foreach (char c; input) {
>  writeln(c);
>  }
> }
>
> Let's say test.txt contains "123456".
>
> The output will be
> 3
> 4
> 5
> 6
>
> The "take" consumed one chunk from the file, but if I increase the chunk
> size to 4, then it won't.

I don't understand the issue fully but byChunk() will treat every 
character in the file. So, even the newline character(s) are considered.


> Actually, what is the easiest way to read a large file as a stream? My
> file contains a bunch of serialized messages of variable length.

If it's a text file I think I would start with File.byLine (or 
byLineCopy). Then it depends on how the messages are layed out. One per 
line? Do you know the size at the start? etc.


Alternatively, use (or examine) one of the great D serialization modules 
out there. :)


(We already need something like this in the standard library, which I 
think some people are already working on.)


Ali



Re: byChunk odd behavior?

2016-03-22 Thread Taylor Hillegeist via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 07:17:41 UTC, Hanh wrote:

Hi all,

I'm trying to process a rather large file as an InputRange and 
run into something strange with byChunk / take.


void test() {
auto file = new File("test.txt");
auto input = file.byChunk(2).joiner;
input.take(3).array;
foreach (char c; input) {
writeln(c);
}
}

Let's say test.txt contains "123456".

The output will be
3
4
5
6

The "take" consumed one chunk from the file, but if I increase 
the chunk size to 4, then it won't.


It looks like if "take" spans two chunks, it affects the input 
range otherwise it doesn't.


Actually, what is the easiest way to read a large file as a 
stream? My file contains a bunch of serialized messages of 
variable length.


Thanks,
--h


I dont know if this helps, but it looks like since take three 
doesn't consume the chunk it is not removed from the range.


import std.stdio;
import std.algorithm;
import std.range;

void main() {
auto file = stdin;
auto input = file.byChunk(2).joiner;

foreach (char c; input.take(3).array) {
writeln(c);
}

foreach (char c; input) {
writeln(c);
}
}

Produces:
1
2
3 < Got data but didn't eat the chunk.
3
4
5
6


Re: byChunk odd behavior?

2016-03-22 Thread Hanh via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 07:17:41 UTC, Hanh wrote:

Hi all,

I'm trying to process a rather large file as an InputRange and 
run into something strange with byChunk / take.


void test() {
auto file = new File("test.txt");
auto input = file.byChunk(2).joiner;
input.take(3).array;
foreach (char c; input) {
writeln(c);
}
}

Let's say test.txt contains "123456".

The output will be
3
4
5
6

The "take" consumed one chunk from the file, but if I increase 
the chunk size to 4, then it won't.


It looks like if "take" spans two chunks, it affects the input 
range otherwise it doesn't.


Actually, what is the easiest way to read a large file as a 
stream? My file contains a bunch of serialized messages of 
variable length.


Thanks,
--h


I have the feeling that it's related to the forward only nature 
of an InputRange. All would be ok with a take(N)+popFrontN 
method. I'm going to keep looking.


byChunk odd behavior?

2016-03-22 Thread Hanh via Digitalmars-d-learn

Hi all,

I'm trying to process a rather large file as an InputRange and 
run into something strange with byChunk / take.


void test() {
auto file = new File("test.txt");
auto input = file.byChunk(2).joiner;
input.take(3).array;
foreach (char c; input) {
writeln(c);
}
}

Let's say test.txt contains "123456".

The output will be
3
4
5
6

The "take" consumed one chunk from the file, but if I increase 
the chunk size to 4, then it won't.


It looks like if "take" spans two chunks, it affects the input 
range otherwise it doesn't.


Actually, what is the easiest way to read a large file as a 
stream? My file contains a bunch of serialized messages of 
variable length.


Thanks,
--h