I am considering a new representation for Swift refcounts and other per-object 
data. This is an outline of the scheme. Comments and suggestions welcome.

Today, each object stores 64-bits of refcounts and flags after the isa field.

In this new system, each object would store a pointer-size field after the isa 
field. This field would have two cases: it could store refcounts and flags, or 
it could store a pointer to a side allocation that would store refcounts and 
flags and additional per-object data.

Advantages:
* Saves 4 bytes per object on 32-bit for most objects.
* Improves refcount overflow and underflow detection.
* Might allow an inlineable retain/release fast path in the future.
* Allows a new weak reference implementation that doesn't need to keep entire 
dead objects alive.
* Allows inexpensive per-object storage for future features like associated 
references or class extensions with instance variables.

Disadvantages:
* Basic RR operations might be slower on x86_64. This needs to be measured. ARM 
architectures are probably unchanged.

----

The MSB bit would distinguish between the fastest-path in-object retain/release 
and everything else. Objects that use some other RR path would have that bit 
set. This would include objects whose refcount is stored in the side allocation 
and objects whose refcount does not change because they are allocated on the 
stack or in read-only memory.

The MSB bit also becomes set if you increment or decrement a retain count too 
far. That means we can implement the RR fast path with a single conditional 
branch after the increment or decrement:

retain:
    intptr_t oldRC = obj->rc
    newRC = oldRC + RC_ONE    // sets MSB on overflow; MSB already set for 
other special cases
    if (newRC >= 0) {
        CAS(obj->rc = oldRC => newRC)
    } else {
        call slow path
        // out-of-object refcount     (MSB bits 0b10x)
        // or refcount has overflowed (MSB bits 0b111)
        // or refcount is constant    (MSB bits 0b110)
    }

release:
    intptr_t oldRC = obj->rc
    newRC = oldRC - RC_ONE    // sets MSB on overflow; MSB already set for 
other special cases
    if (newRC >= 0) {
        CAS(obj->rc = oldRC => newRC)
    } else {
        call slow path
        // dealloc                     (MSB bits 0b111)
        // or out-of-object refcount   (MSB bits 0b10x)
        // or refcount has underflowed (MSB bits 0b111 and deallocating bit 
already set)
        // or refcount is constant     (MSB bits 0b110)
    }

There are some fussy bit representation details here to make sure that a 
pre-existing MSB=1 does not become 0 after an increment or decrement. 

(In the more distant future this fast path could be inlineable while preserving 
ABI flexibility: if worse comes to worse we can set the MSB all the time and 
force inliners to fall back to the slow path runtime function. We don't want to 
do this yet though.)

The side allocation could be used for:
* New weak reference implementation that doesn't need to keep entire dead 
objects alive.
* Associated references or class extensions with instance variables
* Full-size strong refcount and unowned refcount on 32-bit architectures
* Future concurrency data or debugging instrumentation data

The Objective-C runtime uses a side table for similar purposes. It has the 
disadvantage that retrieving an object's side allocation requires use of a 
global hash table, which is slow and requires locking. This scheme would be 
faster and contention-free.

Installing a side allocation on an object would be a one-way operation for 
thread-safety reasons. For example, an object might be given a side allocation 
when it is first weakly referenced, but the object would not go back to 
in-object refcounts if the weak reference went away. Most objects would not 
need a side allocation.

----

Weak references could be implemented using the side allocation. A weak variable 
would point to the object's side allocation. The side allocation would store a 
pointer to the object and a strong refcount and a weak refcount. (This weak 
refcount would be distinct from the unowned refcount.)  The weak refcount would 
be incremented once for every weak variable holding this object. 

The advantage of using a side allocation for weak references is that the 
storage for a weakly-referenced object could be freed synchronously when deinit 
completes. Only the small side allocation would remain, backing the weak 
variables until they are cleared on their next access. This is a memory 
improvement over today's scheme, which keeps the object's entire storage alive 
for a potentially long time.

The hierarchy:
  Strong refcount goes to zero: deinit
  Unowned refcount goes to zero: free the object
  Weak refcount goes to zero: free the side allocation

When a weakly-referenced object is destroyed, it would free its own storage but 
leave the side allocation alive until all of the weak references go away. 

When a weak variable is read, it would go to the side table first and 
atomically increment the strong refcount if the deallocating bit were not set. 
Then it would return the object pointer stored in the side allocation. If the 
deallocating bit was set, it would atomically decrement the weak refcount and 
free the side allocation if it reaches zero. (There is another race here that 
probably requires separate side bits for object-is-deallocating and 
object-is-deallocated.)

When an old value is erased from a weak variable, it would atomically decrement 
the weak refcount in the side allocation and free the side allocation if it 
reaches zero.

When a new value is stored to a weak variable is written, it would install a 
side allocation if necessary, then check the deallocating bit in the side 
allocation. If the object is not deallocating it would atomically increment the 
weak refcount.

----

RR fast paths in untested x86_64 assembly (AT&T syntax, destination on the 
right):

retain_fast:
   // object in %rdi
   mov   8(%rdi), %rax
1: mov   %rax, %rdx
   add   $0x200000000, %rdx
   bmi   retain_slow
   lock,cmpxchg %rdx, 8(%rdi)
   bne   1b

release_fast:
   // object in %rdi
   mov   8(%rdi), %rax
1: mov   %rax, %rdx
   sub   $0x200000000, %rdx
   bmi   release_slow
   lock,cmpxchg %rdx, 8(%rdi)
   bne   1b


RR fast paths in untested arm64 assembly

retain_fast:
   // object in x0
   add   x1, x0, #8
1: ldxr  x2, [x1]
   mov   x3, #0x200000000
   adds  x2, x2, x3
   b.mi  retain_slow
   stxr  w4, x2, [x1]
   cbz   w4, 1b

release_fast:
   // object in x0
   add   x1, x0, #8
1: ldxr  x2, [x1]
   mov   x3, #0x200000000
   subs  x2, x2, x3
   b.mi  release_slow
   stlxr w4, x2, [x1]
   cbz   w4, 1b


-- 
Greg Parker     gpar...@apple.com     Runtime Wrangler


_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to