On Tuesday 13 November 2001 01:20 pm, Jason Gloudon wrote:
> On Mon, Nov 12, 2001 at 11:59:08PM -0500, Michael L Maraist wrote:
> > 2)
> > Can we assume that a "buffer object" is ONLY accessible by a single
> > reverse-path-to-PMC? PMC's or array-buffers can point to other PMC's, so
> > it's possible to have multiple paths from the root to an object, but I'm
> > asking if, for example, we could use an algorithm that relied upon the
> > fact that it only has one direct parent.
>
> That assumption would mean one could not directly share non-constant
> buffers between strings. There has been talk of having copy-on-write
> strings.
That's fine. I didn't really have strings in mind, and it's possible to
treat them separately. We're already having to treat handles differently
than buffer-objects. I'm also wanting to segregate lists of leaf-buffers,
arrays, hashes, etc, so as to avoid putting switch-statements in the inner
loop of the marker. Since each type would have a different inner loop,
strings are confined to the same algorithms. So, any other ideas related
specifically to PMC's?
One interesting thing to note is that strings are leaf-objects. Furhter,
copy-on-write would only be efficient if set for values that have more than
one parent (otherwise each modification would require a copy). I haven't
seen the discussion, so I don't know where it'll wind up, but if this is the
case, then it seems to me that the only way to determine if copy-on-write
should be applied (in an efficient manner) is via some variation of reference
counting. Even if only 1-bit ref-counting is used. Such as:
// In pseudo-code
STR_REG[x] = newString("abc"); // f_copy_on_write = 0
STR_REG[x] _= newString("foo"); // check f_c_o_w; it was 0, so we modify
string (possibly resizing)
STR_REG[x] = PL_null_str; // "abc" eventually reclaimed by GC
STR_REG[x] = newString("abc"); // f_c_o_w = 0
STR_REG[y] = STR_REG[x] // f_c_o_w = 1
STR_REG[x] _= newString("bar"); // check f_c_o_w; was 1, so we copy out
// at this point, a GC-pass would count the instances of "abc" and notice
that it only has one handle. It's f_c_o_w would be reset to 0.
Even if the GC didn't perform this last stage, the system would work, it
would just cause a greater percentage of multi-ref'd garbage and copying.
Additionally, note that such a pass would insinuate a pre-GC-stage which
resets f_c_o_w's to zero.
Setting f_c_o_w to 1 all the time is slightly faster than performing an
increment. Further, separate string vtables could be utilized to avoid the
if-statements. (being a RO string instead of a RW string).
Lastly, the overhead of a bit might as well be a byte, and the garbage
reduction by actually performing full-reference counting (albeit with a GC
fallback) might make this worthwhile. full ref-counting on strings would be
harder to work with for XS-code though.
-Michael