Do STRINGs have value or reference semantics?

I'm exploring the idea of forbidding in-place modification of STRINGs in the C 
API; the functions will return new STRING headers with the changes.  This has 
implications for PIR code which expects that STRINGs have reference semantics 
-- that you can modify a STRING referred by multiple locations.

Currently Parrot seems to prefer reference semantics.  A handful of 
frequently-called C functions perform a copy-on-write (COW) operation to 
create a new STRING header every time a STRING header escapes -- in other 
words, because they can't tell if the escaping header will get modified, they 
have to allocate a new header with COW semantics for every escaping header, 
even if the header only ever gets read (or becomes garbage immediately).

The NQP-rx benchmark represents some likely HLL performance:

        ./parrot ext/nqp-rx/nqp-rx.pbc --target=pir Actions.pm

Some ~72% of all STRING COW headers created are for internal bookkeeping only 
-- to prevent the accidental modification of a STRING out from underneath 
something else that uses it.  This occurs in two places in the benchmark.  The 
first is when fetching the STRING contents of a Key PMC.  The second is when 
using a constant STRING (one created with CONST_STRING in our .c files, for 
example, or appearing as a literal in PIR) as a parameter to a function.

Another occasion which does not appear in this benchmark is when fetching the 
name of a Class.  (You can imagine how modifying that STRING in place would 
cause problems.)

Note that the String PMC's get_string() vtable entry always returns a COW 
STRING.  The set S, SC opcode performs COW on the STRING constant.

Removing the always-COW from the Key PMC (when dealing with STRINGs) speeds up 
the benchmark by 2.504%.

Removing the always-COW from constant STRINGs used as function parameters 
speeds up the benchmark by 1.204%.

Both together speed up the benchmark by 3.678%.

This particular benchmark shows no change in GC performance, which suggests 
that the GC pressure primarily comes from PMCs.  Another benchmark with 
different STRING usage would show more benefit if it had STRING pressure on the 
GC.

A couple of test files show failures with these changes, but they're where you 
might expect them:

t/op/string.t                      (Wstat: 11 Tests: 392 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 411 tests but ran 392.
t/pmc/key.t                        (Wstat: 11 Tests: 8 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 9 tests but ran 8.

-- c
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply via email to