[Issue 17881] Provide mechanism to preallocate memory from the GC

2023-10-15 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

Steven Schveighoffer  changed:

   What|Removed |Added

   See Also||https://issues.dlang.org/sh
   ||ow_bug.cgi?id=2504

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2022-12-17 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

Iain Buclaw  changed:

   What|Removed |Added

   Priority|P1  |P4

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2018-10-28 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

Stanislav Blinov  changed:

   What|Removed |Added

 CC||stanislav.bli...@gmail.com

--- Comment #13 from Stanislav Blinov  ---
Created attachment 1713
  --> https://issues.dlang.org/attachment.cgi?id=1713=edit
Example of paged allocation with GC

Jumping in a year later almost to the hour, I'll start with a disclaimer: I
agree with Andrei that such functionality in general is best left for a custom
allocator API.
That said, having a straightforward utility for chunked GC allocations in
Phobos or DRuntime would benefit users who don't necessarily want to import or
deal with custom allocators for "just that one allocation" (scripts,
prototyping, etc.).

Attached is an implementation of the sketch suggested in previous comments; see
comments in the code about certain implementation/DRuntime quirks. This
implementation on my machine is about 10x faster than piecewise allocation with
both dmd and ldc; it's still slower, of course, than a wholesale preallocation,
but at least should be on the same order of magnitude, unless you go for
astronomical amounts of instances.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-28 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

--- Comment #12 from Rainer Schuetze  ---
The current GC already does something in that regard as it pre-assigns a new
memory page to a free-list of elements of the size of the last request.

The GC locking overhead can be reduced by helping the inliner a bit with some
assembly. I have tried that for Win64 recently and the benchmarks showed an
improvement of about 5%.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-25 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

--- Comment #11 from Steven Schveighoffer  ---
(In reply to Andrei Alexandrescu from comment #10)
> This issue should be closed. A good allocator should be able to handle
> repeated allocations of the same size without this kind of hint.

I'm all for a solution that doesn't need hints. As far as I can tell, that
solution hasn't been disclosed yet. If you know how to write a "good allocator"
I really think you should do it, it would do wonders for D's GC performance.
Certainly, feel free to take this ER down in the event that becomes available.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-24 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

--- Comment #10 from Andrei Alexandrescu  ---
(In reply to Steven Schveighoffer from comment #8)
> What I am looking for,
> though, is the high-level API of "I have 20 million T's I want to allocate,
> give me them properly allocated and typed one at a time".

This issue should be closed. A good allocator should be able to handle repeated
allocations of the same size without this kind of hint.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-11 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

--- Comment #9 from safety0ff.bugz  ---
(In reply to Steven Schveighoffer from comment #8)
> Internally, I think something like this is needed. What I am looking for,
> though, is the high-level API of "I have 20 million T's I want to allocate,
> give me them properly allocated and typed one at a time". I don't
> necessarily even think you need the TypeInfo for the low level API. Perhaps
> the bit attributes can even be done as you allocate each block.

Here's the rough sketch of one idea:

void[] gc_nmalloc(size_t num, size_t size, void[] prev = [], uint ba = 0, const
TypeInfo ti = null)
// prev is used for temporary pinning/unpinning a memory range

struct GC // in core.memory
{
  auto NMalloc(T)(size_t num) { // convenience function
 int loop(int delegate(T*) dg) {
   void[] prev; size_t remain = num;
   scope(exit) gc_nmalloc(0,T.sizeof,prev); // unpin last chunk
   int result;
   while (remain) {
 prev = gc_nmalloc(remain, T.sizeof, prev, ...);
 auto chunk = cast(T*)prev.ptr[0 .. prev.length/T.sizeof];
 foreach (ref e; chunk)
   if ((result = dg())) return result;
 remain -= chunk.length;
   }
   return result;
 }
 return 
  }
}

So user code could look like:
foreach (e; GC.NMalloc(T)(20_000_000))
// do something with allocated e

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-11 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

--- Comment #8 from Steven Schveighoffer  ---
(In reply to safety0ff.bugz from comment #7)
> I think the API should obviously should have the size of the elements as a
> separate arguments and provide as much information as malloc e.g.:
> GC.preload(size_t num, size_t size, uint ba = 0, const TypeInfo ti = null)

Internally, I think something like this is needed. What I am looking for,
though, is the high-level API of "I have 20 million T's I want to allocate,
give me them properly allocated and typed one at a time". I don't necessarily
even think you need the TypeInfo for the low level API. Perhaps the bit
attributes can even be done as you allocate each block.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-11 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

safety0ff.bugz  changed:

   What|Removed |Added

 CC||safety0ff.b...@gmail.com

--- Comment #7 from safety0ff.bugz  ---
(In reply to Steven Schveighoffer from comment #6)
> > If we add thread local free-lists to the GC, the overhead of allocating
> > these from the GC instead of caching them in the AA would be rather small.
> 
> Agreed, I think keeping the lists inside the GC is the most useful, and does
> not expose any implementation details to the user.

I think this is the only sane way of doing it. i.e. without over constraining
the GC implementation or exposing the user to implementation details.

I think the API should obviously should have the size of the elements as a
separate arguments and provide as much information as malloc e.g.:
GC.preload(size_t num, size_t size, uint ba = 0, const TypeInfo ti = null)

Otherwise it relies on hope that the GC will pull from the same pool.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-09 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

Alexandru Razvan Caciulescu  changed:

   What|Removed |Added

 CC||alexandru.razva...@gmail.co
   ||m
   Assignee|nob...@puremagic.com|alexandru.razva...@gmail.co
   ||m

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-07 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

--- Comment #6 from Steven Schveighoffer  ---
(In reply to Rainer Schuetze from comment #5)
> A slightly simpler API could be to add allocating N same-sized chunks from
> the GC that returns them in a free-list-like chain.

I think an array is best, since the data already is an array, and a free-list
requires building a linked list before-hand.

> I agree with Andrei that we should not complicate the GC interface for every
> possible allocation pattern a user might want to optimize for, though.

Hm... I'm trying to find the most generic interface for this. It's not
necessarily limited to AA, as allocating a bunch of blocks in a loop isn't an
uncommon thing.

If there was a way to "re-flavor" the blocks from one giant block into
individual ones, then we could do this outside the GC.

> If you call GC.reserve(20_000_000*(Key.sizeof+Value.sizeof)) before
> inserting elements, there should be no collection while filling the AA.

Limiting the collection in the example I posted shaves off about 300-400msec,
but it still is 1.5 seconds vs. 0.1 for the local array version.

> If we add thread local free-lists to the GC, the overhead of allocating
> these from the GC instead of caching them in the AA would be rather small.

Agreed, I think keeping the lists inside the GC is the most useful, and does
not expose any implementation details to the user.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-07 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

Rainer Schuetze  changed:

   What|Removed |Added

 CC||r.sagita...@gmx.de

--- Comment #5 from Rainer Schuetze  ---
A slightly simpler API could be to add allocating N same-sized chunks from the
GC that returns them in a free-list-like chain.

I agree with Andrei that we should not complicate the GC interface for every
possible allocation pattern a user might want to optimize for, though.

If you call GC.reserve(20_000_000*(Key.sizeof+Value.sizeof)) before inserting
elements, there should be no collection while filling the AA.

If we add thread local free-lists to the GC, the overhead of allocating these
from the GC instead of caching them in the AA would be rather small.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-06 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

--- Comment #4 from Steven Schveighoffer  ---
Stevens-MacBook-Pro:testd steves$ cat testpreallocate.d
struct S
{
S *next;
}

void main()
{
version(freelist)
{
S* head = null;
foreach(i; 0 .. 20_000_000)
head = new S(head);
}
version(preallocate)
{
S* head = null;
auto x = new S[20_000_000];
foreach(i; 0 .. 20_000_000)
{
x[i] = S(head);
head = &(x[i]);
}
}
}
Stevens-MacBook-Pro:testd steves$ dmd -O -release testpreallocate.d
-version=freelist
Stevens-MacBook-Pro:testd steves$ time ./testpreallocate

real0m1.869s
user0m1.750s
sys0m0.114s
Stevens-MacBook-Pro:testd steves$ dmd -O -release testpreallocate.d
-version=preallocate
Stevens-MacBook-Pro:testd steves$ time ./testpreallocate

real0m0.111s
user0m0.045s
sys0m0.062s


The point is that the GC is not well-equipped to handle the tight allocation
loop.

The second version has the drawback that all 20 million elements will remain in
memory as long as there is one element still alive.

What I'm looking for is something that has the performance (or similar, I
realize it won't be as good) of the second, but can collect each block
individually like the first.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-06 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

--- Comment #3 from Andrei Alexandrescu  ---
>The use case is different than for GC.allocate.

The semi-joke was this is all needed.

>I want to fill in a structure of nodes, I know I'm going to fill it in with 
>10,000 elements, so I'm going to allocate them all in a loop. But I don't want 
>the GC to keep the ENTIRE thing in memory if just one element is still pointed 
>at. And I don't want to run the GC constantly in my tight loop allocating each 
>node.

The solution is to prefill a freelist, then let go of the extra elements
remaining.

>Think of a tree constructor, or an AA constructor.

>Essentially it's the same as array.reserve, but for individual blocks instead 
>of a contiguous single block.

That's not the job of the allocator. It's the job of the user of the allocator.
The allocator gives you memory. You organize it as you find fit.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-06 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

--- Comment #2 from Steven Schveighoffer  ---
The use case is different than for GC.allocate.

I want to fill in a structure of nodes, I know I'm going to fill it in with
10,000 elements, so I'm going to allocate them all in a loop. But I don't want
the GC to keep the ENTIRE thing in memory if just one element is still pointed
at. And I don't want to run the GC constantly in my tight loop allocating each
node.

Think of a tree constructor, or an AA constructor.

Essentially it's the same as array.reserve, but for individual blocks instead
of a contiguous single block.

--


[Issue 17881] Provide mechanism to preallocate memory from the GC

2017-10-06 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=17881

Andrei Alexandrescu  changed:

   What|Removed |Added

 CC||and...@erdani.com

--- Comment #1 from Andrei Alexandrescu  ---
This seems like a tall order. It's essentially pushing a specific management
need on to the GC, complicating everyone's life in the process.

If an application needs to preallocate, there's an API for that already - it's
called GC.allocate() :o). Then the application needs to do its own bookkeeping.

--