I put some more thoughts onto this and ran a few test implementations.

- The idea of over-allocating memory when creating slices is not feasible,
as the new space would fill up way too quickly and trigger promotions and
cause slices to be converted to seq strings, which is costly and basically
nullifies the performance advantage of slices altogether.

- As Erik pointed out in IM, not doing anything about huge parent strings
being pinned down seems rather attractive, as cases where this would cause a
problem is really rare. VMs of other languages mostly do the same (Java for
example).

I'm still not sure about the risks of space explosion when promoting and
inflating strings. Whenever we promote an object, some space in the new
space becomes vacant, some space in the old space becomes occupied in
return. The sum of occupied space remains the same, of course. If we promote
a slice and inflate it in the process, some space in the new space becomes
vacant, and some space in the old space becomes occupied in return. The only
difference is that the sum of occupied space grows. However, the goal of
freeing up the new space at the cost of the old space has been achieved. If
I'm not mistaken, the scavenger is not being run when we do a
mark-and-compact on the old space. So when we expect the old space to free
up, it actually does, because no slice is inflated during mark-and-compact.

Again, the space required in the old space would have been occupied anyways,
given the old implementation with substrings represented as seq strings. If
inflating slices cause the old space to fill up, resulting in OOM, this
would happen in the old implementation as well, but sooner.

Any thoughts?

Yang


On Wed, Aug 31, 2011 at 3:18 AM, Vitaly Repeshko <[email protected]>wrote:

> On Tue, Aug 30, 2011 at 8:42 AM, Yang Guo <[email protected]> wrote:
> > The previous implementation creates a substring as a full sequential
> string
> > in new space and copies that full seq string into old space when it is
> > promoted during scavenging. The proposed new version creates a substring
> as
> > a slice (pointer to original, length and offset) and inflates it into a
> full
> > seq string into old space when it is promoted during scavenging. In
> > comparison, the new version does not require more memory than the old
> > version. It merely postpones the creation of a full seq string for a
> > substring, with the advantage that short-lived substrings are never
> promoted
> > and inflated.
> > For this reason i think that, in any scenario that would cause an OOM in
> the
> > new version would also have caused OOM in the old version, except the new
> > version would notice the OOM some time later because it has yet to
> inflate
> > the slices. (Conversely, a scenario that creates a lot short-lived
> substring
> > with very large length would cause OOM in the old version, but would not
> hit
> > the wall with the new version.)
>
> Not all allocations are created equal. Without slices substring
> allocations happen when GC is not running and it means the heap can
> expand if necessary. During GC the heap should not expand. What's the
> point of doing a GC if we end up using more memory as a result? A safe
> thing to do is to compare memory usage of a parent string with the
> potential total memory usage of its substrings. If the parent is only
> referenced through the substrings and takes more space than they would
> do combined, it can be collected and replaced by simplified
> (sequential) substrings. I think in cases where slices are really
> useful (and not just used to game benchmarks:), they should not become
> sequential strings.
>
> > A way to avoid the space explosion (as mentioned by Erik) when promoting
> > slices into old space is to reserve the same amount of space for string
> > slices as we would have for their inflated seq string counter parts. This
> > means that creating slices in new space wastes a lot of space, the new
> space
> > fills up more quickly, resulting in more often scavenging/promoting,
> > effectively shortening the time until a slice is inflated into a seq
> string,
> > increasing the chance that a short-lived substring is unnecessarily
> > inflated.
> > Another option would to abandon this CL and promote slices into the old
> > space. This then creates the problem that strings referenced to by slices
> > will not be garbage collected even if they are not referenced elsewhere.
> > This solution would actually be better performance-wise in the dromaeo
> > strings benchmark, but only very slightly.
>
> I leave you in the hands of our GC folks to select the best strategy.
>
> > I've seen the implementation for cons short-circuiting, from which I
> > borrowed some code. Could you give me any pointers on where problems
> could
> > arise when the string shape is changed during GC? Off the top of my head
> I
> > can't think of any place in the code where GC may be invoked in between
> > making a decision based on the string being a slice and extracting the
> > slice's content.
>
> Make sure our generated deferred code paths do not jump back to a spot
> between a shape check and other code. Look through usages of
> StringShape in the runtime code.
>
>
> Thanks,
> Vitaly
>
> > -Yang
> >
> > On Tue, Aug 30, 2011 at 2:51 AM, <[email protected]> wrote:
> >>
> >> This need more work.
> >>
> >> Even if we stop simplifying sliced strings when running out of memory
> >> during GC,
> >> it can hurt other object types that can't tolerate allocation failures.
> >>
> >> Whatever scheme we come up with should be memory efficient. In other
> >> words,
> >> simplifying slices should not increase total memory usage.
> >>
> >> And as Erik points out this change can cause subtle bugs when a string
> >> shape is
> >> changed after being inspected. Have you verified our runtime and
> generated
> >> code
> >> to be immune to this?
> >>
> >>
> >> Thanks,
> >> Vitaly
> >>
> >> http://codereview.chromium.org/7736020/
> >
> > --
> > v8-dev mailing list
> > [email protected]
> > http://groups.google.com/group/v8-dev
>

-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to