Re: [Pharo-project] Fuel - a fast object deployment tool

Nicolas Cellier Fri, 17 Jun 2011 15:10:40 -0700

2011/6/17 Eliot Miranda <[email protected]>:
>
>
> On Fri, Jun 17, 2011 at 2:39 PM, Nicolas Cellier
> <[email protected]> wrote:
>>
>> 2011/6/17 Eliot Miranda <[email protected]>:
>> >
>> >
>> > On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <[email protected]>
>> > wrote:
>> >>
>> >> Hi Eliot,
>> >> I am very happy to read your mail.
>> >>
>> >> On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda
>> >> <[email protected]>
>> >> wrote:
>> >>>
>> >>> Hi Martin & Mariano,
>> >>>     regarding filtering.  Yesterday my colleague Yaron and I
>> >>> successfully
>> >>> finished our port of Fuel to Newspeak and are successfully using it to
>> >>> save
>> >>> and restore our data sets; thank you, its a cool framework.  We had to
>> >>> implement two extensions, the first of which the ability to save and
>> >>> restore
>> >>> Newspeak classes, which is complex because these are instantiated
>> >>> classes
>> >>> inside instantiated Newspeak modules, not static Smalltalk classes in
>> >>> the
>> >>> Smalltalk dictionary.  The second extension is the ability to map
>> >>> specific
>> >>> objects to nil, to prune objects on the way out.  I want to discuss
>> >>> this
>> >>> latter extension.
>> >>> In our data set we have a set of references to objects that are
>> >>> logically
>> >>> not persistent and hence not to be saved.  I'm sure that this will be
>> >>> a
>> >>> common case.  The requirement is for the pickling system to prune
>> >>> certain
>> >>> objects, typically by arranging that when an object graph is pickled,
>> >>> references to the pruned objects are replaced by references to nil.
>> >>>  One way
>> >>> of doing this is as described below, by specifiying per-class lists of
>> >>> instance variables whose referents shoudl not be saved.  But this can
>> >>> be
>> >>> clumsy; there may be references to objects one wants to prune from
>> >>> e.g. more
>> >>> than one class, in which case one may have to provide multiple lists
>> >>> of the
>> >>> relevant inst vars; there may be references to objects one wants to
>> >>> prune
>> >>> from e.g. collections (e.g. sets and dictionaries) in which case the
>> >>> instance variable list approach just doesn't work.
>> >>> Here are two more general schemes.  VFirst, most directly, Fuel could
>> >>> provide two filters, implemented in the default mapper, or the core
>> >>> analyser.  One is a set of classes whose instances are not to be
>> >>> saved.  Any
>> >>> reference to an instance of a class in the toBePrunedClasses set is
>> >>> saved as
>> >>> nil.  The other is a set of instances that are not to be saved, and
>> >>> also any
>> >>> reference to an instance in the toBePruned set is saved as nil.  Why
>> >>> have
>> >>> both?  It can be convenient and efficient to filter by class (in our
>> >>> case we
>> >>> had many instances of a specific class, all of which should be
>> >>> filtered, and
>> >>> finding them could be time consuming), but filtering by class can be
>> >>> too
>> >>> inflexible, there may indeed be specific instances to exclude (thing
>> >>> for
>> >>> example of part of the object graph that functions as a cache; pruning
>> >>> the
>> >>> specific objects in the cache is the right thing to do; pruning all
>> >>> instances of classes whose instances exist in the cache may prune too
>> >>> much).
>> >>> As an example here's how we implemented pruning.  Our system is called
>> >>> Glue, and we start with a mapper for Glue objects, FLGlueMapper:
>> >>> FLMapper subclass: #FLGlueMapper
>> >>> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster
>> >>> modelClasses'
>> >>> classVariableNames: ''
>> >>> poolDictionaries: ''
>> >>> category: 'Fuel-Core-Mappers'
>> >>> It accepts newspeak objects and filters instances in the
>> >>> prunedObjectsClasses set, and as a side-effect collects certain
>> >>> classes that
>> >>> we need in a manifest:
>> >>> FLGlueMapper>>accepts: anObject
>> >>> "Tells if the received object is handled by this analyzer.  We want to
>> >>> hand-off
>> >>> instantiated Newspeak classes to the newspeakClassesCluster, and we
>> >>> want
>> >>> to record other model classes.  We want to filter-out instances of any
>> >>> class
>> >>> in prunedObjectClasses."
>> >>> ^anObject isBehavior
>> >>> ifTrue:
>> >>> [(self isInstantiatedNewspeakClass: anObject)
>> >>> ifTrue: [true]
>> >>> ifFalse:
>> >>> [(anObject inheritsFrom: GlueDataObject) ifTrue:
>> >>> [modelClasses add: anObject].
>> >>> false]]
>> >>> ifFalse:
>> >>> [prunedObjectClasses includes: anObject class]
>> >>> It prunes by mapping instances of the prunedObjectClasses to a special
>> >>> cluster.  It can do this in visitObject: since any newspeak objects it
>> >>> is
>> >>> accepting will be visited in its visitClassOrTrait: method (i.e. it's
>> >>> implicit that all arguments to visitObjects: are instances of the
>> >>> prunedObjectsClasses set).
>> >>> FLGlueMapper>>visitObject: anObject
>> >>> analyzer
>> >>> mapAndTrace: anObject
>> >>> to: FLPrunedObjectsCluster instance
>> >>> into: analyzer clustersWithBaselevelObjects
>> >>> FLPrunedObjectsCluster is a specialization of the nil,true,false
>> >>> cluster
>> >>> that maps its objects to nil:
>> >>> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
>> >>> instanceVariableNames: ''
>> >>> classVariableNames: ''
>> >>> poolDictionaries: ''
>> >>> category: 'Fuel-Core-Clusters'
>> >>> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
>> >>> super serialize: nil on: aWriteStream
>> >>>
>> >>> So this would generalize by the analyser having an e.g.
>> >>> FLPruningMapper
>> >>> as the first mapper, and this having a prunedObjects and a
>> >>> priunedObjectClasses set and going something like this:
>> >>> FLPruningMapper>>accepts: anObject
>> >>> ^(prunedObjects includes: anObject) or: [prunedObjectClasses includes:
>> >>> anObject class]
>> >>> FLPruningMapper >>visitObject: anObject
>> >>> analyzer
>> >>> mapAndTrace: anObject
>> >>> to: FLPrunedObjectsCluster instance
>> >>> into: analyzer clustersWithBaselevelObjects
>> >>> and then one would provide accessors in FLSerialzer and/or FLAnalyser
>> >>> to
>> >>> add objects and classes to the prunedObjects and prunedObjectClasses
>> >>> set.
>> >>> For efficiency one could arrange that the FLPruningMapper was not
>> >>> added
>> >>> to the sequence of mappers unless and until objects or classes were
>> >>> added
>> >>> to the prunedObjects and prunedObjectClasses set.
>> >>
>> >> Excellent. I love the botanical metaphor of pruning! Of course we can
>> >> include FLPruningMapper and FLPrunedObjectsCluster in Fuel.
>> >>
>> >> We are also interested in pruning objects but not necessarily replacing
>> >> them by nil, but for another user defined objects. For example proxies.
>> >> We
>> >> can extend the pruning stuff for doing that.
>> >
>> > That was an idea Yaron came up with.  That instead of
>> > using fuelIgnoredInstanceVariableNames one uses e.g.
>> > Object>>objectToSerialize
>> >     ^self
>> > and then if one wants to prune specific inst vars in MyClass one
>> > implements
>> > MyClass>>objectToSerialize
>> >     ^self shallowCopy prepareForSerialization
>>
>> Hi Eliot,
>>
>> I'm not convinced by the shallowCopy solution, except for the simple
>> structures.
>> If object graph is complex (have share nodes, loops, ...) then you
>> gonna end up in a replication problem equivalent to the one Fuel is
>> trying to solve.
>
> The assumption is that the analyser would create a maximum of one proxy per
> object in the graph (default, no proxy) and that it would map objects with
> proxies to their proxies.  So if proxies only nilled out inst vars I don't
> see a problem.  What's attractive about this is that it provides a general
> solution to a couple of problems, a) how to replace a class of objects by
> some substitute (e.g. nil), b) how to prune state that needn't be saved.  It
> is also conceptually simple; one just creates a proxy instance; no defining
> metadata, such as inst var names, and hence the code is always up-to-date
> (e.g. a class redefine won't automatically uncover renamed inst vars in
> serialization metadata).


Ah, OK, it occurs after the graph analysis, which I did not catch at first read.
Now I understand better.

Nicolas

>>
>> Nicolas
>>
>> > MyClass>>prepareForSerialization
>> >     instVarIDontWantToSerialize := nil.
>> >     ^self
>> > and for objects one doesn't want to serlalize one implements
>> > MyNotToBeSerializedClass>>objectToSerialize
>> >     ^nil
>> > So its more general.  But I would pass the analyser in as an argument,
>> > which
>> > would allow things like
>> > MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser
>> >     ^(anFLAnalyser shouldPrune: self)
>> >         ifFalse: [self]
>> >         ifTrue: [nil]
>> > which would of course be the default in Object:
>> > Object>>objectToSerializeIn: anFLAnalyser
>> >     ^(anFLAnalyser shouldPrune: self) ifFalse:: [self]
>> >
>> >>
>> >>
>> >>>
>> >>> I think both Yaron and I feel the Fuel framework is comprehensible and
>> >>> flexible.  We enjoyed using it and while we took two passes at coming
>> >>> up
>> >>> with the pruning scheme we liked (our first was based on not
>> >>> serializing
>> >>> specific ins vars and was much more complex than our second, based on
>> >>> pruning instances of specific classes) we got there quickly and will
>> >>> very
>> >>> little frustration along the way.  Thank you very much.
>> >>
>> >> :-) thank you!
>> >>
>> >>>
>> >>> Finally, a couple of things.  First, it may be more flexible to
>> >>> implement
>> >>> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to
>> >>> override certain parts of the mapping framework an implementation can
>> >>> access
>> >>> the analyser to find existing clusters, e.g.
>> >>> MyClass>>fuelClusterIn: anFLAnalyser
>> >>> ^self shouldBeInASpecialCluster
>> >>> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
>> >>> ifFalse: [super fuelClusterIn: anFLAnalyser]
>> >>> This makes it easier to find a specific unique cluster to handle a
>> >>> group
>> >>> of objects specially.
>> >>
>> >> I can't imagine a concrete example but I see that it is more
>> >> flexible...
>> >> the cluster obtained via double dispatch can be anything polymorphic
>> >> with
>> >> MySpecialCluster... that's the point?
>> >
>> > To be honest I'm not sure.  But passing in the analyser in things like
>> > fuelCluster or objectToSerialize is I think a good idea as it provides a
>> > convenient communication path which in turn provides considerable
>> > flexibility.
>> >>
>> >>
>> >>>
>> >>> Lastly, the class-side cluster ids are a bit of a pain.  It would be
>> >>> nice
>> >>> to know a) are these byte values or general integer values, i.e. can
>> >>> there
>> >>> be more than 256 types of cluster?, and b) is there any meaning to the
>> >>> ids?
>> >>>  For example, are clusters ordered by id, or is this just an integer
>> >>> tag?
>> >>>  Also, some class-side code to assign an unused id would be nice.
>> >>> You might think of virtualizing the id scheme.  For example, if
>> >>> FLCluster
>> >>> maintained a weak array of all its subclasses then the id of a cluster
>> >>> could
>> >>> be the index in the array, and the array could be cleaned up
>> >>> occasionally.
>> >>>  Then each fuel serialization could start with the list of cluster
>> >>> class
>> >>> names and ids, so that specific values of ids are specific to a
>> >>> particular
>> >>> serialization.
>> >>
>> >> I do agree, these ids are an heritage from the first prototypes of
>> >> fuel,
>> >> they should be revised. a) yes, it is encoded in only one byte; b) just
>> >> an
>> >> integer tag, the only purpose of the id was for decoding fast: read a
>> >> byte
>> >> and then look in a dictionary for the corresponding cluster instance.
>> >> We
>> >> could even store the cluster class name but that's inefficient.
>> >
>> > Yes, but how inefficient?  What's the size of all the cluster names?
>> >     FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size + 1]
>> > 670
>> >
>> > So you'd add less than a kilobyte to the size of each serialization and
>> > get
>> > complete freedom from ids.  Something to think about.
>> >>
>> >> Virtualizing the id scheme is a good idea. Much more elegant and
>> >> extensible. The current mechanism not only limits the number of
>> >> possible
>> >> clusters, but also "user defined" extensions can collide, for example
>> >> if
>> >> your Glue cluster id is the same of the Moose cluster id.
>> >>
>> >> I added an issue in our tracker.
>> >>
>> >> If it makes sense, maybe the weak array you suggest can be also used to
>> >> avoid instantiating lots of FLObjectCluster like we are doing in
>> >> Object:
>> >>
>> >> fuelCluster
>> >>     ^ self class isVariable
>> >>         ifTrue: [ FLVariableObjectCluster for: self class ]
>> >>         ifFalse: [ FLFixedObjectCluster for: self class ]
>> >>
>> >> the second time you send fuelCluster to an object, it can reuse the
>> >> cluster instance.
>> >
>> > Right.  I think that's important, and is one reason why I think passing
>> > in
>> > the analyser is important, because it allows certain objects to discover
>> > existing clusters in the analyzer and join them if they want to, instead
>> > of
>> > having to invent and maintain their own cluster uniquing solution
>> > .
>> >>>
>> >>> again thanks for a great framework.
>> >>
>> >> Thanks for your words and the feedback. Is Glue published somewhere?
>> >
>> > No, and its extremely proprietary :)  Newspeak however is available and
>> > we
>> > may end up maintaining a port of Fuel for Newspeak.
>> > best regards,
>> > Eliot
>> >
>> >>
>> >> regards
>> >> Martin
>> >>
>> >>
>> >>>
>> >>> best,
>> >>> Eliot
>> >>
>> >>
>> >>>
>> >>> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck
>> >>> <[email protected]> wrote:
>> >>>>
>> >>>>
>> >>>> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda
>> >>>> <[email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi Martin and Mariano,
>> >>>>>     a couple of questions.  What's the right way to exclude certain
>> >>>>> objects from the serialization?  Is there a way of excluding certain
>> >>>>> inst
>> >>>>> vars from certain objects?
>> >>>>
>> >>>>
>> >>>> Eliot and the rest....Martin implemented this feature in
>> >>>> Fuel-MartinDias.258. For the moment, we decided to put
>> >>>> #fuelIgnoredInstanceVariableNames at class side.
>> >>>>
>> >>>> Behavior >> fuelIgnoredInstanceVariableNames
>> >>>>     "Indicates which variables have to be ignored during
>> >>>> serialization."
>> >>>>
>> >>>>     ^#()
>> >>>>
>> >>>>
>> >>>> MyClass class >> fuelIgnoredInstanceVariableNames
>> >>>>   ^ #('instVar1')
>> >>>>
>> >>>>
>> >>>> The impact in speed is nothing, so this is good. Now....we were
>> >>>> thinking
>> >>>> if it is common to need that 2 different instances of the same class
>> >>>> need
>> >>>> different instVars to ignore. Is this common ? do you usually need
>> >>>> this ?
>> >>>> We checked in SIXX and it is at instance side. Java uses the prefix
>> >>>> 'transient' so it is at class side...
>> >>>>
>> >>>> thanks
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Mariano
>> >>>> http://marianopeck.wordpress.com
>> >>>>
>> >>>
>> >>
>> >
>> >
>
>
> --
> best,
> Eliot
>

Re: [Pharo-project] Fuel - a fast object deployment tool

Reply via email to