Re: InputRange help: (1) repeated dtor calls and (2) managing resources needing free()

2018-08-19 Thread James Blachly via Digitalmars-d-learn

On Monday, 13 August 2018 at 13:20:25 UTC, Seb wrote:

BTW it's very uncommon for empty to do work, it's much more 
common to do such lazy initialization in `.front`.




Thanks Seb, that entire reply is a huge help.

By lazy initialization in `.front`, do you mean that I should 
find a way for `front` to preload the first record?


If so, could you help me understand what you mean by lazy init 
with `front`? `empty` is called before `front` upon first 
iteration through the Range, so really the init has to be done in 
the constructor, yes?





Re: InputRange help: (1) repeated dtor calls and (2) managing resources needing free()

2018-08-13 Thread Seb via Digitalmars-d-learn

On Monday, 13 August 2018 at 04:23:49 UTC, James Blachly wrote:

On Thursday, 14 June 2018 at 00:42:25 UTC, James Blachly wrote:

...
I assume the (apparent) lack of parity between ctor and dtor 
is because the "default postblit" (which I figured out for a 
struct means an empty `this(this)` ctor) is called when a copy 
is made. My understanding is that I cannot disable the default 
postblit and still act as a range, correct? Should I be 
overloading this?


2. Directly related to the above, I need, when the range is 
consumed, to free() the underlying library's iterator handle. 
Naively, I had the destructor do this, but obviously with 
multiple calls to ~this I end up with an error free()'ing a 
pointer that is no longer alloc'd.  What is the correct way to 
handle this situation in D?


Other Range and destructor advice generally (e.g., "You should 
totally change your design or approach to X instead") is 
always welcomed.


James


I think I have a handle on #1 (copy of the range is made for 
consumption which is why dtor is called more often than ctor), 
but would still be interested in advice regarding #2 (as well 
as general Range and dtor advice).


Here: 
https://github.com/blachlylab/dhtslib/blob/master/source/dhtslib/tabix.d#L98 I need to free the library's iterator, but the Range's destructor is the wrong place to do this, otherwise memory is freed more than once.


Is it a better approach to (a) somehow guard the call to 
tbx_itr_destroy or (b) create a postblit that creates a new 
iterator and pointer? (or (c), None of the above)


I would "guard" the call to tbx_itr_destroy by means of reference 
counting (see below).


As above, my understanding is that disabling the default 
posblit prohibits acting as a Range.


That's not true. It just makes the range harder to be used.
Last year, for example, it was proposed to make the ranges in 
std.random non-copyable because you don't want to accidentally 
copy your random state and it was only that bigger refactorings 
were planned for std.random which sadly never materialized that 
this didn't happen.


BTW it's very uncommon for empty to do work, it's much more 
common to do such lazy initialization in `.front`.


If I use the range, the destructor seems to be called many, 
many times.


Then you probably make many copies.

In some ways, this problem is generalizable to all InputRanges 
that represent a file or record stream.


Yep, and that's why I recommend to have a look at e.g. 
std.stdio.File:


- it does its initialization in the constructor [1]
- it uses reference-counting for its allocated space and pointers 
[2, 3] (File is often shared by default, that's why atomic 
reference counting is necessary here)


Have a look at this minimal example of reference-counting:

https://run.dlang.io/is/GF5vbC

The copies you see go away when the struct is passed by reference:

https://run.dlang.io/is/Uhs5Bt

[1] 
https://github.com/dlang/phobos/blob/565a51f8c6e8b703c0b625568a6f14473345f5d8/std/stdio.d#L394
[2] 
https://github.com/dlang/phobos/blob/565a51f8c6e8b703c0b625568a6f14473345f5d8/std/stdio.d#L474
[3] 
https://github.com/dlang/phobos/blob/565a51f8c6e8b703c0b625568a6f14473345f5d8/std/stdio.d#L835


Re: InputRange help: (1) repeated dtor calls and (2) managing resources needing free()

2018-08-12 Thread James Blachly via Digitalmars-d-learn

On Thursday, 14 June 2018 at 00:42:25 UTC, James Blachly wrote:

...
I assume the (apparent) lack of parity between ctor and dtor is 
because the "default postblit" (which I figured out for a 
struct means an empty `this(this)` ctor) is called when a copy 
is made. My understanding is that I cannot disable the default 
postblit and still act as a range, correct? Should I be 
overloading this?


2. Directly related to the above, I need, when the range is 
consumed, to free() the underlying library's iterator handle. 
Naively, I had the destructor do this, but obviously with 
multiple calls to ~this I end up with an error free()'ing a 
pointer that is no longer alloc'd.  What is the correct way to 
handle this situation in D?


Other Range and destructor advice generally (e.g., "You should 
totally change your design or approach to X instead") is always 
welcomed.


James


I think I have a handle on #1 (copy of the range is made for 
consumption which is why dtor is called more often than ctor), 
but would still be interested in advice regarding #2 (as well as 
general Range and dtor advice).


Here: 
https://github.com/blachlylab/dhtslib/blob/master/source/dhtslib/tabix.d#L98 I need to free the library's iterator, but the Range's destructor is the wrong place to do this, otherwise memory is freed more than once.


Is it a better approach to (a) somehow guard the call to 
tbx_itr_destroy or (b) create a postblit that creates a new 
iterator and pointer? (or (c), None of the above) As above, my 
understanding is that disabling the default posblit prohibits 
acting as a Range.


Thanks in advance


InputRange help: (1) repeated dtor calls and (2) managing resources needing free()

2018-06-13 Thread James Blachly via Digitalmars-d-learn

Hi all,

I now really appreciate the power Ranges provide and am an avid 
consumer, but am only slowly becoming accustomed to implementing 
my own.


In the present problem, I am writing a binding to a C library 
(htslib) that provides many functions related to high-throughput 
sequencing files. One of these functions is for rapid indexed 
lookup into multi-GB files. The library provides a handle to an 
iterator which must be supplied to a "get next matching row" type 
function, which overall seems perfect for implementation as a 
range.  You can see my naive implementation here:


https://github.com/blachlylab/dhtslib/blob/master/source/dhtslib/tabix.d

Note that TabixIndexedFile::region returns an InputRange; in the 
original implementation, this Range preloaded the first record 
(the ctor called popFirst()), but ultimately I realized this was 
not workable because copies of the object would always be 
non-empty. In some ways, this problem is generalizable to all 
InputRanges that represent a file or record stream.



My problems now are at least twofold.

1. If I use the range, the destructor seems to be called many, 
many times. This is directly related to problem 2, below, but I 
would be interested to understand why this is happening 
generally. For example, see:

https://github.com/blachlylab/dhtslib/blob/master/test/tabix_gffreader.d
Here, when I create the range but do not consume it, the ctor and 
dtor are called once each, as expected. However, if I 
foreach(line; r) { } the destructor is called twice. If I reason 
through this, it is because use of the range created a copy to 
consume. (?) However, if instead, I writeln( r ), the destructor 
is called *five* times. I cannot understand the reason for this, 
unless it is black magic required by writeln().


I assume the (apparent) lack of parity between ctor and dtor is 
because the "default postblit" (which I figured out for a struct 
means an empty `this(this)` ctor) is called when a copy is made. 
My understanding is that I cannot disable the default postblit 
and still act as a range, correct? Should I be overloading this?


2. Directly related to the above, I need, when the range is 
consumed, to free() the underlying library's iterator handle. 
Naively, I had the destructor do this, but obviously with 
multiple calls to ~this I end up with an error free()'ing a 
pointer that is no longer alloc'd.  What is the correct way to 
handle this situation in D?


Other Range and destructor advice generally (e.g., "You should 
totally change your design or approach to X instead") is always 
welcomed.


James