Re: Interior pointers and fast GC

2017-01-22 Thread Shachar Shemesh via Digitalmars-d

On 21/01/17 17:30, Nick Treleaven wrote:

On Saturday, 14 January 2017 at 15:30:42 UTC, Rainer Schuetze wrote:

In addition, you need to lookup the pool anyway to figure out if the
pointer points to non-managed memory (stack, global data, malloc'd
memory).


Makes me wonder about a GC'd language where each pointer is actually a
member of a struct which also has a base allocation pointer. The base
pointer could either only be set for managed allocations, or the low bit
of the base address could be used to indicate such instead*. This would
make for faster GC scans, but would also cause slowdowns when copying
pointers. Pointer arithmetic would be just as efficient though.

(*) With this scheme, pointer arithmetic can be @safe in assert mode.


You are neglecting to account for cache locality performance cost. An 
array of pointers would, under this plan, store half as many pointers in 
a single cache line. This means handling this array would be, give or 
take, half as fast. That's a huge price to pay.


Shachar


Re: Interior pointers and fast GC

2017-01-22 Thread Chris Wright via Digitalmars-d
On Sun, 22 Jan 2017 06:44:47 +, Araq wrote:

> On Sunday, 22 January 2017 at 06:28:35 UTC, Chris Wright wrote:
>> On Sun, 22 Jan 2017 05:02:43 +, Araq wrote:
>>> It's an O(1) that requires a hash table lookup in general because
>>> allocations can exceed the chunk size and so you cannot just mask the
>>> pointer and look at the chunk header because it might not be a chunk
>>> header at all. Know any production GCs that use hash table lookups for
>>> pointer assignments? Me neither. Ok ok, maybe Go does, it's the only
>>> language with GC that embraces interior pointers as stupid as that is.
>>
>> Overexplaining because I'm tired.
>>
>> If you have fixed-size allocations, you get much better time: O(log
>> #pools) to find the pool, O(1) to find the offset in that pool. The
>> cost is you multiply the number of pools in existence by the number of
>> allocation sizes you decide to support.
>>
>>
> Sure, O(log #pools) + O(1) is "much better" than O(1).

Much better than what's available today, unless I misunderstood the 
current GC again.

A tree lookup with < 100 elements in the tree won't be much slower than a 
hashtable lookup with O(# pages allocated) entries in the table. If you 
control virtual memory more strictly (and have virtual memory to burn -- 
so not on 32-bit), you can put a better constant on the hashtable lookup.

> To shorten the discussion: You're looking for a variant of "card
> marking". No overhead for interior pointers if you do it right. I think,
> never worked out the details though.

Card marking is about write barriers. Write barriers are about 
incremental collection, caching part of your past GC run information so 
you can update it using only those pages of memory that have changed 
since last time. It's low granularity because it's essentially free to 
read an entire page of memory once you've read one byte. It's also low 
granularity because one common strategy is to mprotect(3) the pages as 
read-only and install a fault handler, which is an expensive way of doing 
things that you want to repeat fewer times if possible.

It's not what I'm looking for.


Re: Interior pointers and fast GC

2017-01-22 Thread deadalnix via Digitalmars-d

On Sunday, 22 January 2017 at 05:02:43 UTC, Araq wrote:
It's an O(1) that requires a hash table lookup in general 
because allocations can exceed the chunk size and so you cannot 
just mask the pointer and look at the chunk header because it 
might not be a chunk header at all. Know any production GCs 
that use hash table lookups for pointer assignments? Me 
neither. Ok ok, maybe Go does, it's the only language with GC 
that embraces interior pointers as stupid as that is.


Huge allocs are always handled specifically by allocators. The 
usual technique is via a radix tree. But it doesn't really matter 
for the discussion at hand: huge alloc are not numerous. If you 
have 4G of RAM, by definition, you need to have less than a 1000 
of them with he above mentioned scheme. The whole lookup 
datastructure can fit in cache.


Re: Interior pointers and fast GC

2017-01-21 Thread Araq via Digitalmars-d

On Sunday, 22 January 2017 at 06:28:35 UTC, Chris Wright wrote:

On Sun, 22 Jan 2017 05:02:43 +, Araq wrote:
It's an O(1) that requires a hash table lookup in general 
because allocations can exceed the chunk size and so you 
cannot just mask the pointer and look at the chunk header 
because it might not be a chunk header at all. Know any 
production GCs that use hash table lookups for pointer 
assignments? Me neither. Ok ok, maybe Go does, it's the only 
language with GC that embraces interior pointers as stupid as 
that is.


Overexplaining because I'm tired.

If you have fixed-size allocations, you get much better time: 
O(log #pools) to find the pool, O(1) to find the offset in that 
pool. The cost is you multiply the number of pools in existence 
by the number of allocation sizes you decide to support.




Sure, O(log #pools) + O(1) is "much better" than O(1).

To shorten the discussion: You're looking for a variant of "card 
marking". No overhead for interior pointers if you do it right. I 
think, never worked out the details though.


http://stackoverflow.com/questions/19154607/how-actually-card-table-and-writer-barrier-works


Re: Interior pointers and fast GC

2017-01-21 Thread Chris Wright via Digitalmars-d
On Sun, 22 Jan 2017 05:02:43 +, Araq wrote:
> It's an O(1) that requires a hash table lookup in general because
> allocations can exceed the chunk size and so you cannot just mask the
> pointer and look at the chunk header because it might not be a chunk
> header at all. Know any production GCs that use hash table lookups for
> pointer assignments? Me neither. Ok ok, maybe Go does, it's the only
> language with GC that embraces interior pointers as stupid as that is.

Overexplaining because I'm tired.

If you have fixed-size allocations, you get much better time: O(log 
#pools) to find the pool, O(1) to find the offset in that pool. The cost 
is you multiply the number of pools in existence by the number of 
allocation sizes you decide to support.

This obviously helps for structs and classes.

It also helps for small arrays: a small array is a fixed-size allocation 
that can potentially be reallocated.

For instance, I decide to support allocations of 16, 32, and 256 bytes. 
The string "A British tar is a soaring soul" fits into a 32-byte 
allocation, so I put it in the 32-byte pool.

If I append the string " as free as a mountain bird" to it, that no 
longer fits, so I copy it into the 256 byte pool and append there.

If you then take a slice from it, "a soaring soul", I can find the 
relevant pool by examining every pool (or by traversing a search tree). 
The pool says it's a 32-byte pool, so floor-32 of the pointer is the base 
pointer.

Large heap objects (anything above the maximum supported size, which will 
be about one OS page) don't afford this convenience -- they aren't of any 
particular size, so you still need to search through every allocation.

So the performance of D's GC is rather strongly dependent on how large 
arrays tend to get.

Unless I'm misunderstanding the GC yet again.


Re: Interior pointers and fast GC

2017-01-21 Thread Araq via Digitalmars-d

On Saturday, 21 January 2017 at 17:42:46 UTC, deadalnix wrote:


1. Split the heap in chunk of size n being a power of 2, say 
4M. Align them 4M.
2. Find the chunk an alloc is part of in O(1) bu masking the 
lower bits (22 bits to mask in our 4M case).
3. Have a table of page descriptor in the chunk header. Lookup 
the page the alloc is in in O(1).
4a. If the alloc is large (flag in the page descriptor), find 
the base pointer in O(1).
4b. if the alloc is small, compute the index of the item in the 
page from the size class in the page descriptor (on addition, 
one multiply and one shift) in O(1).


Start on false premise, end up nowhere.


It's an O(1) that requires a hash table lookup in general because 
allocations can exceed the chunk size and so you cannot just mask 
the pointer and look at the chunk header because it might not be 
a chunk header at all. Know any production GCs that use hash 
table lookups for pointer assignments? Me neither. Ok ok, maybe 
Go does, it's the only language with GC that embraces interior 
pointers as stupid as that is.


Re: Interior pointers and fast GC

2017-01-21 Thread deadalnix via Digitalmars-d

On Saturday, 14 January 2017 at 04:37:01 UTC, Chris Wright wrote:
Unfortunately, given an interior pointer, you can't identify 
the base of its heap object in constant time.


1. Split the heap in chunk of size n being a power of 2, say 4M. 
Align them 4M.
2. Find the chunk an alloc is part of in O(1) bu masking the 
lower bits (22 bits to mask in our 4M case).
3. Have a table of page descriptor in the chunk header. Lookup 
the page the alloc is in in O(1).
4a. If the alloc is large (flag in the page descriptor), find the 
base pointer in O(1).
4b. if the alloc is small, compute the index of the item in the 
page from the size class in the page descriptor (on addition, one 
multiply and one shift) in O(1).


Start on false premise, end up nowhere.


Re: Interior pointers and fast GC

2017-01-21 Thread Nick Treleaven via Digitalmars-d
On Saturday, 14 January 2017 at 15:30:42 UTC, Rainer Schuetze 
wrote:
In addition, you need to lookup the pool anyway to figure out 
if the pointer points to non-managed memory (stack, global 
data, malloc'd memory).


Makes me wonder about a GC'd language where each pointer is 
actually a member of a struct which also has a base allocation 
pointer. The base pointer could either only be set for managed 
allocations, or the low bit of the base address could be used to 
indicate such instead*. This would make for faster GC scans, but 
would also cause slowdowns when copying pointers. Pointer 
arithmetic would be just as efficient though.


(*) With this scheme, pointer arithmetic can be @safe in assert 
mode.


Re: Interior pointers and fast GC

2017-01-14 Thread Rainer Schuetze via Digitalmars-d



On 14.01.2017 05:37, Chris Wright wrote:

Interior pointers are a barrier to performant garbage collection.


[...]

 * In a binary search tree, a prefix tree, or the like

[...]

The third strategy is O(log N) in the expected case and has similar data
locality to the hashtable.

Obviously, we prefer the first strategy. Unfortunately, given an interior
pointer, you can't identify the base of its heap object in constant time.
I believe D's garbage collector uses a binary search tree.

D allows unrestricted interior pointers. This makes it difficult to
experiment with other garbage collectors, except for those designed for D
or C++.


The garbage collected memory is organized as pools of growing size. For 
each pool there are lookup tables for each page that has fixed sized 
memory blocks. That way finding the base of an allocation is a matter of 
finding the pool (currently O(log #pools); by N I assume you mean the 
number of allocations) but the rest is constant time.


IIRC an application that uses 1 GB of memory has about 20 pools, which 
means the time to figure out the base address and the size of an 
allocation is more or less constant.


In addition, you need to lookup the pool anyway to figure out if the 
pointer points to non-managed memory (stack, global data, malloc'd memory).


So, I don't think interior pointers are a performance problem of D's GC. 
Other GCs might not be able to deal with these, though.


The much larger problem with trying more sophisticated GCs is the lack 
of a fast write barrier. If we get one with compiler assisted 
(automatic) reference counting, we might aswell try this for other types 
of GC.


Interior pointers and fast GC

2017-01-13 Thread Chris Wright via Digitalmars-d
Interior pointers are a barrier to performant garbage collection.

Here, I'm brainstorming about the problem and not really coming to any 
conclusions.

The world today
===
D's garbage collector is significantly less performant than Java's, and 
this is partly the result of D's features. Interior pointers are costly.

What's the deal with interior pointers?
---
A heap object (for this discussion) is a series of bytes on the GC heap.

An interior pointer is a pointer to some memory location within a heap 
object, but not to the beginning of the heap object.

A garbage collector that never has to deal with internal pointers has 
several options for storing metadata related to a heap object:

 * In a segment of data immediately before the heap object
 * In a hashmap
 * In a binary search tree, a prefix tree, or the like

The first strategy is O(1) and has good data locality relative to the 
data pointed at. Eclipse OMR assumes you use this strategy (though it 
supports other methods).

The second strategy is O(1) in the expected case. By default, it has poor 
data locality -- though you can improve that with careful management of 
virtual memory.

The third strategy is O(log N) in the expected case and has similar data 
locality to the hashtable.

Obviously, we prefer the first strategy. Unfortunately, given an interior 
pointer, you can't identify the base of its heap object in constant time. 
I believe D's garbage collector uses a binary search tree.

D allows unrestricted interior pointers. This makes it difficult to 
experiment with other garbage collectors, except for those designed for D 
or C++.


Addressof
-
D lets you take the address of anything:

  class Paddock {
  uint goats, sheep;
  }
  auto paddock = new Paddock;
  auto goatPtr = 

The creation of interior pointers this way can be detected at compile 
time, if that benefits us.


Arrays
--
A D array is a struct containing a pointer and a length.

D encourages array slicing:

  auto str = "hello world!";
  foreach (word; str.split(' ')) {
  // This is an interior pointer
  assert(word.ptr >= str.ptr);
  assert(word.ptr < str.ptr + str.length);
  }

Array slicing is splendid and worthwhile, and eliminating it would be a 
huge cost.

We can at least potentially detect non-pathological cases of this.


Closures

Closures can create interior pointers, depending on the compiler's 
implementation:

  class Paddock
  {
  uint goats, sheep;
  void trackCattle() {
  cattleEnter.connect((flock) {
  if (flock.isGoats) goats += flock.size;
  if (flock.isSheep) sheep += flock.size;
  });
  }
  }

The compiler might include a copy of the `this` pointer in the closure, 
or it might include direct pointers to `goats` and `sheep`. Both produce 
equivalent results aside from the creation of interior pointers.

When the closure is defined in a struct method, it is impossible to 
ensure that the contents of the closure do not include an interior 
pointer -- the struct itself might be a member of another heap object.


What options do we have?

If we want to reduce interior pointers, we have a number of options. We 
would likely need several to get reasonable results.

Forbid user-generated interior pointers
---
We might simply forbid taking the address of a member of a heap object, 
turning it into a compile-time error, or merely leaving it as undefined 
behavior.

This is unlikely to be feasible. D's lifeblood is array slicing, and that 
produces interior pointers left and right.


Augment arrays
--
We can augment arrays to eliminate interior pointers for common 
operations:

  struct Array(T)
  {
  private T* baseptr;
  size_t offset;
  size_t length;
  T* ptr() { return baseptr + offset; }
  }

It is possible to construct an array from a pointer, and any pointer 
might be an interior pointer, but this strategy would at least reduce the 
number of interior pointers we have floating around.


Change struct this pointer to pointer+offset

We convert struct `this` pointers to a pointer plus an offset in order to 
eliminate interior pointers from closures involving struct members. This 
probably requires adding a hidden field to each struct, which breaks C 
compatibility. It also requires a lot of bookkeeping, and I'm not sure 
that's always possible. Uninitialized struct values break this.


Mark allocations that might contain interior pointers
-
Since we can detect many things that might create interior pointers at 
compile time, we can explicitly mark them as such. Conversely, we can 
prove that many things do not contain interior pointers, and we could 
mark those.

When we scan a heap object that is marked as not containing interior 
pointers, we