Re: Output range with custom string type

2017-08-31 Thread Moritz Maxeiner via Digitalmars-d-learn

On Thursday, 31 August 2017 at 07:06:26 UTC, Jacob Carlborg wrote:

On 2017-08-29 19:35, Moritz Maxeiner wrote:


 void put(T t)
 {
     if (!store)
     {
     // Allocate only once for "small" vectors
     store = alloc.makeArray!T(8);
     if (!store) onOutOfMemoryError();
     }
     else if (length == store.length)
     {
     // Growth factor of 1.5
     auto expanded = alloc.expandArray!char(store, 
store.length / 2);

     if (!expanded) onOutOfMemoryError();
     }
     assert (length < store.length);
     moveEmplace(t, store[length++]);
 }


What's the reason to use "moveEmplace" instead of just 
assigning to the array: "store[length++] = t" ?


The `move` part is to support non-copyable types (i.e. T with 
`@disable this(this)`), such as another owning container 
(assigning would generally try to create a copy).
The `emplace` part is because the destination `store[length]` has 
been default initialized either by makeArray or expandArray and 
it doesn't need to be destroyed (a pure move would destroy 
`store[length]` if T has a destructor).


Re: Output range with custom string type

2017-08-31 Thread Jacob Carlborg via Digitalmars-d-learn

On 2017-08-29 19:35, Moritz Maxeiner wrote:


 void put(T t)
 {
     if (!store)
     {
     // Allocate only once for "small" vectors
     store = alloc.makeArray!T(8);
     if (!store) onOutOfMemoryError();
     }
     else if (length == store.length)
     {
     // Growth factor of 1.5
     auto expanded = alloc.expandArray!char(store, store.length 
/ 2);

     if (!expanded) onOutOfMemoryError();
     }
     assert (length < store.length);
     moveEmplace(t, store[length++]);
 }


What's the reason to use "moveEmplace" instead of just assigning to the 
array: "store[length++] = t" ?


--
/Jacob Carlborg


Re: Output range with custom string type

2017-08-30 Thread Jacob Carlborg via Digitalmars-d-learn

On 2017-08-29 19:35, Moritz Maxeiner wrote:

On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote:

[...]

But if I keep the range internal, can't I just do the allocation 
inside the range and only use "formattedWrite"? Instead of using both 
formattedWrite and sformat and go through the data twice. Then of 
course the final size is not known before allocating.


Certainly, that's what dynamic arrays (aka vectors, e.g. std::vector in 
C++ STL) are for:


---
import core.exception;

import std.stdio;
import std.experimental.allocator;
import std.algorithm;

struct PoorMansVector(T)
{
private:
 T[]    store;
 size_t length;
 IAllocator alloc;
public:
 @disable this(this);
 this(IAllocator alloc)
 {
     this.alloc = alloc;
 }
 ~this()
 {
     if (store)
     {
     alloc.dispose(store);
     store = null;
     }
 }
 void put(T t)
 {
     if (!store)
     {
     // Allocate only once for "small" vectors
     store = alloc.makeArray!T(8);
     if (!store) onOutOfMemoryError();
     }
     else if (length == store.length)
     {
     // Growth factor of 1.5
     auto expanded = alloc.expandArray!char(store, store.length 
/ 2);

     if (!expanded) onOutOfMemoryError();
     }
     assert (length < store.length);
     moveEmplace(t, store[length++]);
 }
 char[] release()
 {
     auto elements = store[0..length];
     store = null;
     return elements;
 }
}

char[] sanitize(string value, IAllocator alloc)
{
 import std.format : formattedWrite, sformat;

 auto r = PoorMansVector!char(alloc);
 ().formattedWrite!"'%s'"(value); // do not copy the range
 return r.release();
}

void main()
{
 auto s = sanitize("foo", theAllocator);
 scope (exit) theAllocator.dispose(s);
 writeln(s);
}
---

Do be aware that the above vector is named "poor man's vector" for a 
reason, that's a hasty write down from memory and is sure to contain bugs.
For better vector implementations you can use at collection libraries 
such as EMSI containers; my own attempt at a DbI vector container can be 
found here [1]


[1] 
https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f007417/src/ds/vector.d 



Thanks.

--
/Jacob Carlborg


Re: Output range with custom string type

2017-08-29 Thread Moritz Maxeiner via Digitalmars-d-learn

On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote:

[...]

But if I keep the range internal, can't I just do the 
allocation inside the range and only use "formattedWrite"? 
Instead of using both formattedWrite and sformat and go through 
the data twice. Then of course the final size is not known 
before allocating.


Certainly, that's what dynamic arrays (aka vectors, e.g. 
std::vector in C++ STL) are for:


---
import core.exception;

import std.stdio;
import std.experimental.allocator;
import std.algorithm;

struct PoorMansVector(T)
{
private:
T[]store;
size_t length;
IAllocator alloc;
public:
@disable this(this);
this(IAllocator alloc)
{
this.alloc = alloc;
}
~this()
{
if (store)
{
alloc.dispose(store);
store = null;
}
}
void put(T t)
{
if (!store)
{
// Allocate only once for "small" vectors
store = alloc.makeArray!T(8);
if (!store) onOutOfMemoryError();
}
else if (length == store.length)
{
// Growth factor of 1.5
			auto expanded = alloc.expandArray!char(store, store.length / 
2);

if (!expanded) onOutOfMemoryError();
}
assert (length < store.length);
moveEmplace(t, store[length++]);
}
char[] release()
{
auto elements = store[0..length];
store = null;
return elements;
}
}

char[] sanitize(string value, IAllocator alloc)
{
import std.format : formattedWrite, sformat;

auto r = PoorMansVector!char(alloc);
().formattedWrite!"'%s'"(value); // do not copy the range
return r.release();
}

void main()
{
auto s = sanitize("foo", theAllocator);
scope (exit) theAllocator.dispose(s);
writeln(s);
}
---

Do be aware that the above vector is named "poor man's vector" 
for a reason, that's a hasty write down from memory and is sure 
to contain bugs.
For better vector implementations you can use at collection 
libraries such as EMSI containers; my own attempt at a DbI vector 
container can be found here [1]


[1] 
https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f007417/src/ds/vector.d


Re: Output range with custom string type

2017-08-29 Thread Jacob Carlborg via Digitalmars-d-learn

On 2017-08-28 23:45, Moritz Maxeiner wrote:

If you want the caller to be just in charge of allocation, that's what 
std.experimental.allocator provides. In this case, I would polish up the 
old "format once to get the length, allocate, format second time into 
allocated buffer" method used with snprintf for D:


--- test.d ---
import std.stdio;
import std.experimental.allocator;

struct CountingOutputRange
{
private:
 size_t _count;
public:
 size_t count() { return _count; }
 void put(char c) { _count++; }
}

char[] sanitize(string value, IAllocator alloc)
{
 import std.format : formattedWrite, sformat;

 CountingOutputRange r;
 ().formattedWrite!"'%s'"(value); // do not copy the range

 auto s = alloc.makeArray!char(r.count);
 scope (failure) alloc.dispose(s);

     // This should only throw if the user provided allocator 
returned less

     // memory than was requested
 return s.sformat!"'%s'"(value);
}

void main()
{
 auto s = sanitize("foo", theAllocator);
 scope (exit) theAllocator.dispose(s);
 writeln(s);
}
--


I guess that would work.

But if I keep the range internal, can't I just do the allocation inside 
the range and only use "formattedWrite"? Instead of using both 
formattedWrite and sformat and go through the data twice. Then of course 
the final size is not known before allocating.


--
/Jacob Carlborg


Re: Output range with custom string type

2017-08-28 Thread Cecil Ward via Digitalmars-d-learn

On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote:
I'm working on some code that sanitizes and converts values of 
different types to strings. I thought it would be a good idea 
to wrap the sanitized string in a struct to have some type 
safety. Ideally it should not be possible to create this type 
without going through the sanitizing functions.


The problem I have is that I would like these functions to push 
up the allocation decision to the caller. Internally these 
functions use formattedWrite. I thought the natural design 
would be that the sanitize functions take an output range and 
pass that to formattedWrite.


Here's a really simple example:

import std.stdio : writeln;

struct Range
{
void put(char c)
{
writeln(c);
}
}

void sanitize(OutputRange)(string value, OutputRange range)
{
import std.format : formattedWrite;
range.formattedWrite!"'%s'"(value);
}

void main()
{
Range range;
sanitize("foo", range);
}

The problem now is that the data is passed one char at the time 
to the range. Meaning that if the user implements a custom 
output range, the user is in full control of the data. It will 
now be very easy for the user to make a mistake or manipulate 
the data on purpose. Making the whole idea of the sanitized 
type pointless.


Any suggestions how to fix this or a better idea?


Q is it an option to let the caller provide all the storage in an 
oversized fixed-length buffer? You could add a second helper 
function to compute and return a suitable safely pessimistic ott 
max value for the length reqd which could be called once 
beforehand to establish the reqd buffer size (or check it). This 
is the technique I am using right now. My sizing function is 
ridiculously fast as I am lucky in the particular use-case.


Re: Output range with custom string type

2017-08-28 Thread Moritz Maxeiner via Digitalmars-d-learn

On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote:
I'm working on some code that sanitizes and converts values of 
different types to strings. I thought it would be a good idea 
to wrap the sanitized string in a struct to have some type 
safety. Ideally it should not be possible to create this type 
without going through the sanitizing functions.


The problem I have is that I would like these functions to push 
up the allocation decision to the caller. Internally these 
functions use formattedWrite. I thought the natural design 
would be that the sanitize functions take an output range and 
pass that to formattedWrite.


[...]

Any suggestions how to fix this or a better idea?


If you want the caller to be just in charge of allocation, that's 
what std.experimental.allocator provides. In this case, I would 
polish up the old "format once to get the length, allocate, 
format second time into allocated buffer" method used with 
snprintf for D:


--- test.d ---
import std.stdio;
import std.experimental.allocator;

struct CountingOutputRange
{
private:
size_t _count;
public:
size_t count() { return _count; }
void put(char c) { _count++; }
}

char[] sanitize(string value, IAllocator alloc)
{
import std.format : formattedWrite, sformat;

CountingOutputRange r;
().formattedWrite!"'%s'"(value); // do not copy the range

auto s = alloc.makeArray!char(r.count);
scope (failure) alloc.dispose(s);

// This should only throw if the user provided allocator 
returned less

// memory than was requested
return s.sformat!"'%s'"(value);
}

void main()
{
auto s = sanitize("foo", theAllocator);
scope (exit) theAllocator.dispose(s);
writeln(s);
}
--


Output range with custom string type

2017-08-28 Thread Jacob Carlborg via Digitalmars-d-learn
I'm working on some code that sanitizes and converts values of different 
types to strings. I thought it would be a good idea to wrap the 
sanitized string in a struct to have some type safety. Ideally it should 
not be possible to create this type without going through the sanitizing 
functions.


The problem I have is that I would like these functions to push up the 
allocation decision to the caller. Internally these functions use 
formattedWrite. I thought the natural design would be that the sanitize 
functions take an output range and pass that to formattedWrite.


Here's a really simple example:

import std.stdio : writeln;

struct Range
{
void put(char c)
{
writeln(c);
}
}

void sanitize(OutputRange)(string value, OutputRange range)
{
import std.format : formattedWrite;
range.formattedWrite!"'%s'"(value);
}

void main()
{
Range range;
sanitize("foo", range);
}

The problem now is that the data is passed one char at the time to the 
range. Meaning that if the user implements a custom output range, the 
user is in full control of the data. It will now be very easy for the 
user to make a mistake or manipulate the data on purpose. Making the 
whole idea of the sanitized type pointless.


Any suggestions how to fix this or a better idea?

--
/Jacob Carlborg