I am considering a new representation for Swift refcounts and other per-object
data. This is an outline of the scheme. Comments and suggestions welcome.
Today, each object stores 64-bits of refcounts and flags after the isa field.
In this new system, each object would store a pointer-size field after the isa
field. This field would have two cases: it could store refcounts and flags, or
it could store a pointer to a side allocation that would store refcounts and
flags and additional per-object data.
Advantages:
* Saves 4 bytes per object on 32-bit for most objects.
* Improves refcount overflow and underflow detection.
* Might allow an inlineable retain/release fast path in the future.
* Allows a new weak reference implementation that doesn't need to keep entire
dead objects alive.
* Allows inexpensive per-object storage for future features like associated
references or class extensions with instance variables.
Disadvantages:
* Basic RR operations might be slower on x86_64. This needs to be measured. ARM
architectures are probably unchanged.
----
The MSB bit would distinguish between the fastest-path in-object retain/release
and everything else. Objects that use some other RR path would have that bit
set. This would include objects whose refcount is stored in the side allocation
and objects whose refcount does not change because they are allocated on the
stack or in read-only memory.
The MSB bit also becomes set if you increment or decrement a retain count too
far. That means we can implement the RR fast path with a single conditional
branch after the increment or decrement:
retain:
intptr_t oldRC = obj->rc
newRC = oldRC + RC_ONE // sets MSB on overflow; MSB already set for
other special cases
if (newRC >= 0) {
CAS(obj->rc = oldRC => newRC)
} else {
call slow path
// out-of-object refcount (MSB bits 0b10x)
// or refcount has overflowed (MSB bits 0b111)
// or refcount is constant (MSB bits 0b110)
}
release:
intptr_t oldRC = obj->rc
newRC = oldRC - RC_ONE // sets MSB on overflow; MSB already set for
other special cases
if (newRC >= 0) {
CAS(obj->rc = oldRC => newRC)
} else {
call slow path
// dealloc (MSB bits 0b111)
// or out-of-object refcount (MSB bits 0b10x)
// or refcount has underflowed (MSB bits 0b111 and deallocating bit
already set)
// or refcount is constant (MSB bits 0b110)
}
There are some fussy bit representation details here to make sure that a
pre-existing MSB=1 does not become 0 after an increment or decrement.
(In the more distant future this fast path could be inlineable while preserving
ABI flexibility: if worse comes to worse we can set the MSB all the time and
force inliners to fall back to the slow path runtime function. We don't want to
do this yet though.)
The side allocation could be used for:
* New weak reference implementation that doesn't need to keep entire dead
objects alive.
* Associated references or class extensions with instance variables
* Full-size strong refcount and unowned refcount on 32-bit architectures
* Future concurrency data or debugging instrumentation data
The Objective-C runtime uses a side table for similar purposes. It has the
disadvantage that retrieving an object's side allocation requires use of a
global hash table, which is slow and requires locking. This scheme would be
faster and contention-free.
Installing a side allocation on an object would be a one-way operation for
thread-safety reasons. For example, an object might be given a side allocation
when it is first weakly referenced, but the object would not go back to
in-object refcounts if the weak reference went away. Most objects would not
need a side allocation.
----
Weak references could be implemented using the side allocation. A weak variable
would point to the object's side allocation. The side allocation would store a
pointer to the object and a strong refcount and a weak refcount. (This weak
refcount would be distinct from the unowned refcount.) The weak refcount would
be incremented once for every weak variable holding this object.
The advantage of using a side allocation for weak references is that the
storage for a weakly-referenced object could be freed synchronously when deinit
completes. Only the small side allocation would remain, backing the weak
variables until they are cleared on their next access. This is a memory
improvement over today's scheme, which keeps the object's entire storage alive
for a potentially long time.
The hierarchy:
Strong refcount goes to zero: deinit
Unowned refcount goes to zero: free the object
Weak refcount goes to zero: free the side allocation
When a weakly-referenced object is destroyed, it would free its own storage but
leave the side allocation alive until all of the weak references go away.
When a weak variable is read, it would go to the side table first and
atomically increment the strong refcount if the deallocating bit were not set.
Then it would return the object pointer stored in the side allocation. If the
deallocating bit was set, it would atomically decrement the weak refcount and
free the side allocation if it reaches zero. (There is another race here that
probably requires separate side bits for object-is-deallocating and
object-is-deallocated.)
When an old value is erased from a weak variable, it would atomically decrement
the weak refcount in the side allocation and free the side allocation if it
reaches zero.
When a new value is stored to a weak variable is written, it would install a
side allocation if necessary, then check the deallocating bit in the side
allocation. If the object is not deallocating it would atomically increment the
weak refcount.
----
RR fast paths in untested x86_64 assembly (AT&T syntax, destination on the
right):
retain_fast:
// object in %rdi
mov 8(%rdi), %rax
1: mov %rax, %rdx
add $0x200000000, %rdx
bmi retain_slow
lock,cmpxchg %rdx, 8(%rdi)
bne 1b
release_fast:
// object in %rdi
mov 8(%rdi), %rax
1: mov %rax, %rdx
sub $0x200000000, %rdx
bmi release_slow
lock,cmpxchg %rdx, 8(%rdi)
bne 1b
RR fast paths in untested arm64 assembly
retain_fast:
// object in x0
add x1, x0, #8
1: ldxr x2, [x1]
mov x3, #0x200000000
adds x2, x2, x3
b.mi retain_slow
stxr w4, x2, [x1]
cbz w4, 1b
release_fast:
// object in x0
add x1, x0, #8
1: ldxr x2, [x1]
mov x3, #0x200000000
subs x2, x2, x3
b.mi release_slow
stlxr w4, x2, [x1]
cbz w4, 1b
--
Greg Parker [email protected] Runtime Wrangler
_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev