Do STRINGs have value or reference semantics?
I'm exploring the idea of forbidding in-place modification of STRINGs in the C
API; the functions will return new STRING headers with the changes. This has
implications for PIR code which expects that STRINGs have reference semantics
-- that you can modify a STRING referred by multiple locations.
Currently Parrot seems to prefer reference semantics. A handful of
frequently-called C functions perform a copy-on-write (COW) operation to
create a new STRING header every time a STRING header escapes -- in other
words, because they can't tell if the escaping header will get modified, they
have to allocate a new header with COW semantics for every escaping header,
even if the header only ever gets read (or becomes garbage immediately).
The NQP-rx benchmark represents some likely HLL performance:
./parrot ext/nqp-rx/nqp-rx.pbc --target=pir Actions.pm
Some ~72% of all STRING COW headers created are for internal bookkeeping only
-- to prevent the accidental modification of a STRING out from underneath
something else that uses it. This occurs in two places in the benchmark. The
first is when fetching the STRING contents of a Key PMC. The second is when
using a constant STRING (one created with CONST_STRING in our .c files, for
example, or appearing as a literal in PIR) as a parameter to a function.
Another occasion which does not appear in this benchmark is when fetching the
name of a Class. (You can imagine how modifying that STRING in place would
cause problems.)
Note that the String PMC's get_string() vtable entry always returns a COW
STRING. The set S, SC opcode performs COW on the STRING constant.
Removing the always-COW from the Key PMC (when dealing with STRINGs) speeds up
the benchmark by 2.504%.
Removing the always-COW from constant STRINGs used as function parameters
speeds up the benchmark by 1.204%.
Both together speed up the benchmark by 3.678%.
This particular benchmark shows no change in GC performance, which suggests
that the GC pressure primarily comes from PMCs. Another benchmark with
different STRING usage would show more benefit if it had STRING pressure on the
GC.
A couple of test files show failures with these changes, but they're where you
might expect them:
t/op/string.t (Wstat: 11 Tests: 392 Failed: 0)
Non-zero wait status: 11
Parse errors: Bad plan. You planned 411 tests but ran 392.
t/pmc/key.t (Wstat: 11 Tests: 8 Failed: 0)
Non-zero wait status: 11
Parse errors: Bad plan. You planned 9 tests but ran 8.
-- c
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev