Re: Output range with custom string type
On Thursday, 31 August 2017 at 07:06:26 UTC, Jacob Carlborg wrote: On 2017-08-29 19:35, Moritz Maxeiner wrote: void put(T t) { if (!store) { // Allocate only once for "small" vectors store = alloc.makeArray!T(8); if (!store) onOutOfMemoryError(); } else if (length == store.length) { // Growth factor of 1.5 auto expanded = alloc.expandArray!char(store, store.length / 2); if (!expanded) onOutOfMemoryError(); } assert (length < store.length); moveEmplace(t, store[length++]); } What's the reason to use "moveEmplace" instead of just assigning to the array: "store[length++] = t" ? The `move` part is to support non-copyable types (i.e. T with `@disable this(this)`), such as another owning container (assigning would generally try to create a copy). The `emplace` part is because the destination `store[length]` has been default initialized either by makeArray or expandArray and it doesn't need to be destroyed (a pure move would destroy `store[length]` if T has a destructor).
Re: Output range with custom string type
On 2017-08-29 19:35, Moritz Maxeiner wrote: void put(T t) { if (!store) { // Allocate only once for "small" vectors store = alloc.makeArray!T(8); if (!store) onOutOfMemoryError(); } else if (length == store.length) { // Growth factor of 1.5 auto expanded = alloc.expandArray!char(store, store.length / 2); if (!expanded) onOutOfMemoryError(); } assert (length < store.length); moveEmplace(t, store[length++]); } What's the reason to use "moveEmplace" instead of just assigning to the array: "store[length++] = t" ? -- /Jacob Carlborg
Re: Output range with custom string type
On 2017-08-29 19:35, Moritz Maxeiner wrote: On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote: [...] But if I keep the range internal, can't I just do the allocation inside the range and only use "formattedWrite"? Instead of using both formattedWrite and sformat and go through the data twice. Then of course the final size is not known before allocating. Certainly, that's what dynamic arrays (aka vectors, e.g. std::vector in C++ STL) are for: --- import core.exception; import std.stdio; import std.experimental.allocator; import std.algorithm; struct PoorMansVector(T) { private: T[] store; size_t length; IAllocator alloc; public: @disable this(this); this(IAllocator alloc) { this.alloc = alloc; } ~this() { if (store) { alloc.dispose(store); store = null; } } void put(T t) { if (!store) { // Allocate only once for "small" vectors store = alloc.makeArray!T(8); if (!store) onOutOfMemoryError(); } else if (length == store.length) { // Growth factor of 1.5 auto expanded = alloc.expandArray!char(store, store.length / 2); if (!expanded) onOutOfMemoryError(); } assert (length < store.length); moveEmplace(t, store[length++]); } char[] release() { auto elements = store[0..length]; store = null; return elements; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; auto r = PoorMansVector!char(alloc); (&r).formattedWrite!"'%s'"(value); // do not copy the range return r.release(); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } --- Do be aware that the above vector is named "poor man's vector" for a reason, that's a hasty write down from memory and is sure to contain bugs. For better vector implementations you can use at collection libraries such as EMSI containers; my own attempt at a DbI vector container can be found here [1] [1] https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f007417/src/ds/vector.d Thanks. -- /Jacob Carlborg
Re: Output range with custom string type
On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote: [...] But if I keep the range internal, can't I just do the allocation inside the range and only use "formattedWrite"? Instead of using both formattedWrite and sformat and go through the data twice. Then of course the final size is not known before allocating. Certainly, that's what dynamic arrays (aka vectors, e.g. std::vector in C++ STL) are for: --- import core.exception; import std.stdio; import std.experimental.allocator; import std.algorithm; struct PoorMansVector(T) { private: T[]store; size_t length; IAllocator alloc; public: @disable this(this); this(IAllocator alloc) { this.alloc = alloc; } ~this() { if (store) { alloc.dispose(store); store = null; } } void put(T t) { if (!store) { // Allocate only once for "small" vectors store = alloc.makeArray!T(8); if (!store) onOutOfMemoryError(); } else if (length == store.length) { // Growth factor of 1.5 auto expanded = alloc.expandArray!char(store, store.length / 2); if (!expanded) onOutOfMemoryError(); } assert (length < store.length); moveEmplace(t, store[length++]); } char[] release() { auto elements = store[0..length]; store = null; return elements; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; auto r = PoorMansVector!char(alloc); (&r).formattedWrite!"'%s'"(value); // do not copy the range return r.release(); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } --- Do be aware that the above vector is named "poor man's vector" for a reason, that's a hasty write down from memory and is sure to contain bugs. For better vector implementations you can use at collection libraries such as EMSI containers; my own attempt at a DbI vector container can be found here [1] [1] https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f007417/src/ds/vector.d
Re: Output range with custom string type
On 2017-08-28 23:45, Moritz Maxeiner wrote: If you want the caller to be just in charge of allocation, that's what std.experimental.allocator provides. In this case, I would polish up the old "format once to get the length, allocate, format second time into allocated buffer" method used with snprintf for D: --- test.d --- import std.stdio; import std.experimental.allocator; struct CountingOutputRange { private: size_t _count; public: size_t count() { return _count; } void put(char c) { _count++; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; CountingOutputRange r; (&r).formattedWrite!"'%s'"(value); // do not copy the range auto s = alloc.makeArray!char(r.count); scope (failure) alloc.dispose(s); // This should only throw if the user provided allocator returned less // memory than was requested return s.sformat!"'%s'"(value); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } -- I guess that would work. But if I keep the range internal, can't I just do the allocation inside the range and only use "formattedWrite"? Instead of using both formattedWrite and sformat and go through the data twice. Then of course the final size is not known before allocating. -- /Jacob Carlborg
Re: Output range with custom string type
On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote: I'm working on some code that sanitizes and converts values of different types to strings. I thought it would be a good idea to wrap the sanitized string in a struct to have some type safety. Ideally it should not be possible to create this type without going through the sanitizing functions. The problem I have is that I would like these functions to push up the allocation decision to the caller. Internally these functions use formattedWrite. I thought the natural design would be that the sanitize functions take an output range and pass that to formattedWrite. Here's a really simple example: import std.stdio : writeln; struct Range { void put(char c) { writeln(c); } } void sanitize(OutputRange)(string value, OutputRange range) { import std.format : formattedWrite; range.formattedWrite!"'%s'"(value); } void main() { Range range; sanitize("foo", range); } The problem now is that the data is passed one char at the time to the range. Meaning that if the user implements a custom output range, the user is in full control of the data. It will now be very easy for the user to make a mistake or manipulate the data on purpose. Making the whole idea of the sanitized type pointless. Any suggestions how to fix this or a better idea? Q is it an option to let the caller provide all the storage in an oversized fixed-length buffer? You could add a second helper function to compute and return a suitable safely pessimistic ott max value for the length reqd which could be called once beforehand to establish the reqd buffer size (or check it). This is the technique I am using right now. My sizing function is ridiculously fast as I am lucky in the particular use-case.
Re: Output range with custom string type
On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote: I'm working on some code that sanitizes and converts values of different types to strings. I thought it would be a good idea to wrap the sanitized string in a struct to have some type safety. Ideally it should not be possible to create this type without going through the sanitizing functions. The problem I have is that I would like these functions to push up the allocation decision to the caller. Internally these functions use formattedWrite. I thought the natural design would be that the sanitize functions take an output range and pass that to formattedWrite. [...] Any suggestions how to fix this or a better idea? If you want the caller to be just in charge of allocation, that's what std.experimental.allocator provides. In this case, I would polish up the old "format once to get the length, allocate, format second time into allocated buffer" method used with snprintf for D: --- test.d --- import std.stdio; import std.experimental.allocator; struct CountingOutputRange { private: size_t _count; public: size_t count() { return _count; } void put(char c) { _count++; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; CountingOutputRange r; (&r).formattedWrite!"'%s'"(value); // do not copy the range auto s = alloc.makeArray!char(r.count); scope (failure) alloc.dispose(s); // This should only throw if the user provided allocator returned less // memory than was requested return s.sformat!"'%s'"(value); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } --
Output range with custom string type
I'm working on some code that sanitizes and converts values of different types to strings. I thought it would be a good idea to wrap the sanitized string in a struct to have some type safety. Ideally it should not be possible to create this type without going through the sanitizing functions. The problem I have is that I would like these functions to push up the allocation decision to the caller. Internally these functions use formattedWrite. I thought the natural design would be that the sanitize functions take an output range and pass that to formattedWrite. Here's a really simple example: import std.stdio : writeln; struct Range { void put(char c) { writeln(c); } } void sanitize(OutputRange)(string value, OutputRange range) { import std.format : formattedWrite; range.formattedWrite!"'%s'"(value); } void main() { Range range; sanitize("foo", range); } The problem now is that the data is passed one char at the time to the range. Meaning that if the user implements a custom output range, the user is in full control of the data. It will now be very easy for the user to make a mistake or manipulate the data on purpose. Making the whole idea of the sanitized type pointless. Any suggestions how to fix this or a better idea? -- /Jacob Carlborg