On Fri, Jan 22, 2010 at 4:33 PM, Martin McClure <[email protected]>wrote:

> Eliot Miranda wrote:
> >
> >
> > On Fri, Jan 22, 2010 at 12:35 PM, Stéphane Ducasse
> > <[email protected] <mailto:[email protected]>> wrote:
> >
> >     hi guys
> >
> >     the more I read about imageSegments the more I would like to remove
> >     them (or to package them
> >     carefully - not sure that this is possible) and may be  add a new
> >     class to
> >     just have one simple way of invoking the save (but not swapping back
> in)
> >
> >     I think that mariano diving into them is a great phd exercise but on
> >     the long run
> >     I see it as a brittle mechanism.
> >
> >     what do you think?
> >
> >
> > David Leibs' work on parcels in VW demonstrated that high-performance
> > packaging can be done with no VM support.
>
> Parcels are wonderful things, but my impression is that ImageSegment and
> parcels are designed to do rather different things. Parcels are used for
> physical delivery of packages (primarily code), whereas ImageSegments
> are for arbitrary graphs of objects (primarily not code, although some
> folks have tried to use ImageSegments for code). If parcels are designed
> to be more general than package delivery, I'd like to hear about it.
>

You're talking abut parcels expression in VW not their underlying nature.
 Parcels' underlying nature is just efficient unpacking of object graphs.
 The code stuff is layered above that, so much so that it obscures the
essentials.  Parcels are at core another pickling format, but one that is
very much faster than standard approaches.  My comparisons of BOSS (VW's
Binary Object Storage System, very similar to ReferenceStream et al) and
Parcels showed parcels to be 4 times faster than BOSS.

Basically you get out of a parcel what you put in, and in VW parcels get
code put in them.  But you can put arbitrary objects in parcels.  Note that
parts of the parcel marshalling code is used in Opentalk.

I suspect that the only VM support that ImageSegments really need are
> the mark-sweep primitives to discover what objects are in the
> ImageSegment. All other algorithms, file format, etc. can (and maybe
> should) be redesigned to be better, but using the GC to find the objects
> is the heart of what ImageSegment is.
>

The core problem is that an image segment is a raw binary snapshot of a part
of the heap, so a particular VMs object header formats and tagging scheme is
built-in to the segment.  If I were to evolve the Squeak garbage collector
and object representation to make Cog significantly faster, as I fully
intend to do this year, image segments will at least break backward
compatibility.  They can only be exchanged between VMs running exactly the
same object representation. Parcels can be loaded into very different
systems; VW loads parcels into either 32-bit or 64-bit images without
difficulty, and parcels written before immutability or ephemerons were added
could be loaded after.

So for me image segments are too low-level and constraining.

>
>
> > When you implement a binary
> > format, carefully designed for unpacking performance, at the image level
> > you get the freedom to add flexibility.  I added shape change support to
> > parcels (and some higher level features that aren't relevant here) after
> > David had left.  So I think the right approach is to reimplement image
> > segments entirely in the image without special VM support and add
> > metadata to the format (class shape information) and you'll probably end
> > up with something that is nearly as performant but much more flexible
> > and evolvable.
> >
> >
> > The two keys to the performance of David's design are the separation of
> > objects from their references and the btching of object allocations.  A
> > parcel file starts with a number of allocations of well-known classes
> > (e.g. this parcel contains 17 large integers of the following sizes, and
> > 3 floats, and 17 symbols of the following sizes etc) followed by an
> > arbitrary number of "N instances of class X".  So the unpacker populates
> > an object table with indices from 1 to N where N is the number of
> > objects in the parcel, but it does so in batch, spinning in a loop
> > creating N instances of each class in turn, instead of determining which
> > object to create as it walks a (flattened) input graph.  After the
> > instance data comes the reference data, which slots refer to which
> > objects.  Again the unpacker can spin filling in slots from the
> > reference data instead of determining whether to instantiate an object
> > or dereference an object id as it walks the input graph.   So loading is
> > much faster than e.g. ReferenceStream-style approaches.
>
> I like that approach for a file format. It probably doesn't even make
> writing the file out much slower; the work has to be done in multiple
> passes, but each pass is simpler. And write speed is important: A parcel
> is typically written once and loaded many times, but one common pattern
> of ImageSegment use is write once, load once, discard, or even write
> once, load never.
>

Right.  Writing out needs to be fast, but IIRC image segments aren't written
incrementally; writing happens after a full walk of the graph has been made.
 So all we're talking about is adding a sort phase that assigns object table
ids in the graph which shouldn't be expensive.


> Regards,
>
> -Martin
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to