Re: [Pharo-project] Fuel - a fast object deployment tool

Martin Dias Sun, 19 Jun 2011 22:12:11 -0700

I had some problems when I tried to write a package export tool with fuel. I
wanted to store the classes without method extensions of other packages.
Maybe with the objectToSerializeIn: idea I can write:


Class>>objectToSerializeIn: anFLAnalyser
    ^(anFLAnalyser shouldAvoidForeignProtocol: self)
        ifFalse: [self]
        ifTrue: [self copyWithoutForeignProtocol]

Cheers,
Martin

On Mon, Jun 20, 2011 at 1:48 AM, Martin Dias <[email protected]> wrote:

> I think the substitution should be done during the graph trace. Following
> with the example, if a proxy replaces an object, the proxy represents a
> subgraph that is appended and so it should be traced.
>
> For that we should keep track of the substitutions. I'm not sure how
> complex is that but is think it's not so difficult.
>
> Seems to be a great idea, we have to try it. I like that avoids writing
> inst var names as strings. I have no idea if with *slots* implemented then
> we will be able to return inst vars as first-class objects... but anyway
> this looks like the a nice solution.
>
> So, we have this as a pending issue as well as the id virtualization.
> Thanks for the ideas and the discussion!
>
> Martin
>
>
> On Fri, Jun 17, 2011 at 7:09 PM, Nicolas Cellier <
> [email protected]> wrote:
>
>> 2011/6/17 Eliot Miranda <[email protected]>:
>> >
>> >
>> > On Fri, Jun 17, 2011 at 2:39 PM, Nicolas Cellier
>> > <[email protected]> wrote:
>> >>
>> >> 2011/6/17 Eliot Miranda <[email protected]>:
>> >> >
>> >> >
>> >> > On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> Hi Eliot,
>> >> >> I am very happy to read your mail.
>> >> >>
>> >> >> On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda
>> >> >> <[email protected]>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi Martin & Mariano,
>> >> >>>     regarding filtering.  Yesterday my colleague Yaron and I
>> >> >>> successfully
>> >> >>> finished our port of Fuel to Newspeak and are successfully using it
>> to
>> >> >>> save
>> >> >>> and restore our data sets; thank you, its a cool framework.  We had
>> to
>> >> >>> implement two extensions, the first of which the ability to save
>> and
>> >> >>> restore
>> >> >>> Newspeak classes, which is complex because these are instantiated
>> >> >>> classes
>> >> >>> inside instantiated Newspeak modules, not static Smalltalk classes
>> in
>> >> >>> the
>> >> >>> Smalltalk dictionary.  The second extension is the ability to map
>> >> >>> specific
>> >> >>> objects to nil, to prune objects on the way out.  I want to discuss
>> >> >>> this
>> >> >>> latter extension.
>> >> >>> In our data set we have a set of references to objects that are
>> >> >>> logically
>> >> >>> not persistent and hence not to be saved.  I'm sure that this will
>> be
>> >> >>> a
>> >> >>> common case.  The requirement is for the pickling system to prune
>> >> >>> certain
>> >> >>> objects, typically by arranging that when an object graph is
>> pickled,
>> >> >>> references to the pruned objects are replaced by references to nil.
>> >> >>>  One way
>> >> >>> of doing this is as described below, by specifiying per-class lists
>> of
>> >> >>> instance variables whose referents shoudl not be saved.  But this
>> can
>> >> >>> be
>> >> >>> clumsy; there may be references to objects one wants to prune from
>> >> >>> e.g. more
>> >> >>> than one class, in which case one may have to provide multiple
>> lists
>> >> >>> of the
>> >> >>> relevant inst vars; there may be references to objects one wants to
>> >> >>> prune
>> >> >>> from e.g. collections (e.g. sets and dictionaries) in which case
>> the
>> >> >>> instance variable list approach just doesn't work.
>> >> >>> Here are two more general schemes.  VFirst, most directly, Fuel
>> could
>> >> >>> provide two filters, implemented in the default mapper, or the core
>> >> >>> analyser.  One is a set of classes whose instances are not to be
>> >> >>> saved.  Any
>> >> >>> reference to an instance of a class in the toBePrunedClasses set is
>> >> >>> saved as
>> >> >>> nil.  The other is a set of instances that are not to be saved, and
>> >> >>> also any
>> >> >>> reference to an instance in the toBePruned set is saved as nil.
>>  Why
>> >> >>> have
>> >> >>> both?  It can be convenient and efficient to filter by class (in
>> our
>> >> >>> case we
>> >> >>> had many instances of a specific class, all of which should be
>> >> >>> filtered, and
>> >> >>> finding them could be time consuming), but filtering by class can
>> be
>> >> >>> too
>> >> >>> inflexible, there may indeed be specific instances to exclude
>> (thing
>> >> >>> for
>> >> >>> example of part of the object graph that functions as a cache;
>> pruning
>> >> >>> the
>> >> >>> specific objects in the cache is the right thing to do; pruning all
>> >> >>> instances of classes whose instances exist in the cache may prune
>> too
>> >> >>> much).
>> >> >>> As an example here's how we implemented pruning.  Our system is
>> called
>> >> >>> Glue, and we start with a mapper for Glue objects, FLGlueMapper:
>> >> >>> FLMapper subclass: #FLGlueMapper
>> >> >>> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster
>> >> >>> modelClasses'
>> >> >>> classVariableNames: ''
>> >> >>> poolDictionaries: ''
>> >> >>> category: 'Fuel-Core-Mappers'
>> >> >>> It accepts newspeak objects and filters instances in the
>> >> >>> prunedObjectsClasses set, and as a side-effect collects certain
>> >> >>> classes that
>> >> >>> we need in a manifest:
>> >> >>> FLGlueMapper>>accepts: anObject
>> >> >>> "Tells if the received object is handled by this analyzer.  We want
>> to
>> >> >>> hand-off
>> >> >>> instantiated Newspeak classes to the newspeakClassesCluster, and we
>> >> >>> want
>> >> >>> to record other model classes.  We want to filter-out instances of
>> any
>> >> >>> class
>> >> >>> in prunedObjectClasses."
>> >> >>> ^anObject isBehavior
>> >> >>> ifTrue:
>> >> >>> [(self isInstantiatedNewspeakClass: anObject)
>> >> >>> ifTrue: [true]
>> >> >>> ifFalse:
>> >> >>> [(anObject inheritsFrom: GlueDataObject) ifTrue:
>> >> >>> [modelClasses add: anObject].
>> >> >>> false]]
>> >> >>> ifFalse:
>> >> >>> [prunedObjectClasses includes: anObject class]
>> >> >>> It prunes by mapping instances of the prunedObjectClasses to a
>> special
>> >> >>> cluster.  It can do this in visitObject: since any newspeak objects
>> it
>> >> >>> is
>> >> >>> accepting will be visited in its visitClassOrTrait: method (i.e.
>> it's
>> >> >>> implicit that all arguments to visitObjects: are instances of the
>> >> >>> prunedObjectsClasses set).
>> >> >>> FLGlueMapper>>visitObject: anObject
>> >> >>> analyzer
>> >> >>> mapAndTrace: anObject
>> >> >>> to: FLPrunedObjectsCluster instance
>> >> >>> into: analyzer clustersWithBaselevelObjects
>> >> >>> FLPrunedObjectsCluster is a specialization of the nil,true,false
>> >> >>> cluster
>> >> >>> that maps its objects to nil:
>> >> >>> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
>> >> >>> instanceVariableNames: ''
>> >> >>> classVariableNames: ''
>> >> >>> poolDictionaries: ''
>> >> >>> category: 'Fuel-Core-Clusters'
>> >> >>> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
>> >> >>> super serialize: nil on: aWriteStream
>> >> >>>
>> >> >>> So this would generalize by the analyser having an e.g.
>> >> >>> FLPruningMapper
>> >> >>> as the first mapper, and this having a prunedObjects and a
>> >> >>> priunedObjectClasses set and going something like this:
>> >> >>> FLPruningMapper>>accepts: anObject
>> >> >>> ^(prunedObjects includes: anObject) or: [prunedObjectClasses
>> includes:
>> >> >>> anObject class]
>> >> >>> FLPruningMapper >>visitObject: anObject
>> >> >>> analyzer
>> >> >>> mapAndTrace: anObject
>> >> >>> to: FLPrunedObjectsCluster instance
>> >> >>> into: analyzer clustersWithBaselevelObjects
>> >> >>> and then one would provide accessors in FLSerialzer and/or
>> FLAnalyser
>> >> >>> to
>> >> >>> add objects and classes to the prunedObjects
>> and prunedObjectClasses
>> >> >>> set.
>> >> >>> For efficiency one could arrange that the FLPruningMapper was not
>> >> >>> added
>> >> >>> to the sequence of mappers unless and until objects or classes were
>> >> >>> added
>> >> >>> to the prunedObjects and prunedObjectClasses set.
>> >> >>
>> >> >> Excellent. I love the botanical metaphor of pruning! Of course we
>> can
>> >> >> include FLPruningMapper and FLPrunedObjectsCluster in Fuel.
>> >> >>
>> >> >> We are also interested in pruning objects but not necessarily
>> replacing
>> >> >> them by nil, but for another user defined objects. For example
>> proxies.
>> >> >> We
>> >> >> can extend the pruning stuff for doing that.
>> >> >
>> >> > That was an idea Yaron came up with.  That instead of
>> >> > using fuelIgnoredInstanceVariableNames one uses e.g.
>> >> > Object>>objectToSerialize
>> >> >     ^self
>> >> > and then if one wants to prune specific inst vars in MyClass one
>> >> > implements
>> >> > MyClass>>objectToSerialize
>> >> >     ^self shallowCopy prepareForSerialization
>> >>
>> >> Hi Eliot,
>> >>
>> >> I'm not convinced by the shallowCopy solution, except for the simple
>> >> structures.
>> >> If object graph is complex (have share nodes, loops, ...) then you
>> >> gonna end up in a replication problem equivalent to the one Fuel is
>> >> trying to solve.
>> >
>> > The assumption is that the analyser would create a maximum of one proxy
>> per
>> > object in the graph (default, no proxy) and that it would map objects
>> with
>> > proxies to their proxies.  So if proxies only nilled out inst vars I
>> don't
>> > see a problem.  What's attractive about this is that it provides a
>> general
>> > solution to a couple of problems, a) how to replace a class of objects
>> by
>> > some substitute (e.g. nil), b) how to prune state that needn't be saved.
>>  It
>> > is also conceptually simple; one just creates a proxy instance; no
>> defining
>> > metadata, such as inst var names, and hence the code is always
>> up-to-date
>> > (e.g. a class redefine won't automatically uncover renamed inst vars in
>> > serialization metadata).
>>
>> Ah, OK, it occurs after the graph analysis, which I did not catch at first
>> read.
>> Now I understand better.
>>
>> Nicolas
>>
>> >>
>> >> Nicolas
>> >>
>> >> > MyClass>>prepareForSerialization
>> >> >     instVarIDontWantToSerialize := nil.
>> >> >     ^self
>> >> > and for objects one doesn't want to serlalize one implements
>> >> > MyNotToBeSerializedClass>>objectToSerialize
>> >> >     ^nil
>> >> > So its more general.  But I would pass the analyser in as an
>> argument,
>> >> > which
>> >> > would allow things like
>> >> > MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser
>> >> >     ^(anFLAnalyser shouldPrune: self)
>> >> >         ifFalse: [self]
>> >> >         ifTrue: [nil]
>> >> > which would of course be the default in Object:
>> >> > Object>>objectToSerializeIn: anFLAnalyser
>> >> >     ^(anFLAnalyser shouldPrune: self) ifFalse:: [self]
>> >> >
>> >> >>
>> >> >>
>> >> >>>
>> >> >>> I think both Yaron and I feel the Fuel framework is comprehensible
>> and
>> >> >>> flexible.  We enjoyed using it and while we took two passes at
>> coming
>> >> >>> up
>> >> >>> with the pruning scheme we liked (our first was based on not
>> >> >>> serializing
>> >> >>> specific ins vars and was much more complex than our second, based
>> on
>> >> >>> pruning instances of specific classes) we got there quickly and
>> will
>> >> >>> very
>> >> >>> little frustration along the way.  Thank you very much.
>> >> >>
>> >> >> :-) thank you!
>> >> >>
>> >> >>>
>> >> >>> Finally, a couple of things.  First, it may be more flexible to
>> >> >>> implement
>> >> >>> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying
>> to
>> >> >>> override certain parts of the mapping framework an implementation
>> can
>> >> >>> access
>> >> >>> the analyser to find existing clusters, e.g.
>> >> >>> MyClass>>fuelClusterIn: anFLAnalyser
>> >> >>> ^self shouldBeInASpecialCluster
>> >> >>> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
>> >> >>> ifFalse: [super fuelClusterIn: anFLAnalyser]
>> >> >>> This makes it easier to find a specific unique cluster to handle a
>> >> >>> group
>> >> >>> of objects specially.
>> >> >>
>> >> >> I can't imagine a concrete example but I see that it is more
>> >> >> flexible...
>> >> >> the cluster obtained via double dispatch can be anything polymorphic
>> >> >> with
>> >> >> MySpecialCluster... that's the point?
>> >> >
>> >> > To be honest I'm not sure.  But passing in the analyser in things
>> like
>> >> > fuelCluster or objectToSerialize is I think a good idea as it
>> provides a
>> >> > convenient communication path which in turn provides considerable
>> >> > flexibility.
>> >> >>
>> >> >>
>> >> >>>
>> >> >>> Lastly, the class-side cluster ids are a bit of a pain.  It would
>> be
>> >> >>> nice
>> >> >>> to know a) are these byte values or general integer values, i.e.
>> can
>> >> >>> there
>> >> >>> be more than 256 types of cluster?, and b) is there any meaning to
>> the
>> >> >>> ids?
>> >> >>>  For example, are clusters ordered by id, or is this just an
>> integer
>> >> >>> tag?
>> >> >>>  Also, some class-side code to assign an unused id would be nice.
>> >> >>> You might think of virtualizing the id scheme.  For example, if
>> >> >>> FLCluster
>> >> >>> maintained a weak array of all its subclasses then the id of a
>> cluster
>> >> >>> could
>> >> >>> be the index in the array, and the array could be cleaned up
>> >> >>> occasionally.
>> >> >>>  Then each fuel serialization could start with the list of cluster
>> >> >>> class
>> >> >>> names and ids, so that specific values of ids are specific to a
>> >> >>> particular
>> >> >>> serialization.
>> >> >>
>> >> >> I do agree, these ids are an heritage from the first prototypes of
>> >> >> fuel,
>> >> >> they should be revised. a) yes, it is encoded in only one byte; b)
>> just
>> >> >> an
>> >> >> integer tag, the only purpose of the id was for decoding fast: read
>> a
>> >> >> byte
>> >> >> and then look in a dictionary for the corresponding cluster
>> instance.
>> >> >> We
>> >> >> could even store the cluster class name but that's inefficient.
>> >> >
>> >> > Yes, but how inefficient?  What's the size of all the cluster names?
>> >> >     FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size +
>> 1]
>> >> > 670
>> >> >
>> >> > So you'd add less than a kilobyte to the size of each serialization
>> and
>> >> > get
>> >> > complete freedom from ids.  Something to think about.
>> >> >>
>> >> >> Virtualizing the id scheme is a good idea. Much more elegant and
>> >> >> extensible. The current mechanism not only limits the number of
>> >> >> possible
>> >> >> clusters, but also "user defined" extensions can collide, for
>> example
>> >> >> if
>> >> >> your Glue cluster id is the same of the Moose cluster id.
>> >> >>
>> >> >> I added an issue in our tracker.
>> >> >>
>> >> >> If it makes sense, maybe the weak array you suggest can be also used
>> to
>> >> >> avoid instantiating lots of FLObjectCluster like we are doing in
>> >> >> Object:
>> >> >>
>> >> >> fuelCluster
>> >> >>     ^ self class isVariable
>> >> >>         ifTrue: [ FLVariableObjectCluster for: self class ]
>> >> >>         ifFalse: [ FLFixedObjectCluster for: self class ]
>> >> >>
>> >> >> the second time you send fuelCluster to an object, it can reuse the
>> >> >> cluster instance.
>> >> >
>> >> > Right.  I think that's important, and is one reason why I think
>> passing
>> >> > in
>> >> > the analyser is important, because it allows certain objects to
>> discover
>> >> > existing clusters in the analyzer and join them if they want to,
>> instead
>> >> > of
>> >> > having to invent and maintain their own cluster uniquing solution
>> >> > .
>> >> >>>
>> >> >>> again thanks for a great framework.
>> >> >>
>> >> >> Thanks for your words and the feedback. Is Glue published somewhere?
>> >> >
>> >> > No, and its extremely proprietary :)  Newspeak however is available
>> and
>> >> > we
>> >> > may end up maintaining a port of Fuel for Newspeak.
>> >> > best regards,
>> >> > Eliot
>> >> >
>> >> >>
>> >> >> regards
>> >> >> Martin
>> >> >>
>> >> >>
>> >> >>>
>> >> >>> best,
>> >> >>> Eliot
>> >> >>
>> >> >>
>> >> >>>
>> >> >>> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck
>> >> >>> <[email protected]> wrote:
>> >> >>>>
>> >> >>>>
>> >> >>>> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda
>> >> >>>> <[email protected]>
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> Hi Martin and Mariano,
>> >> >>>>>     a couple of questions.  What's the right way to exclude
>> certain
>> >> >>>>> objects from the serialization?  Is there a way of excluding
>> certain
>> >> >>>>> inst
>> >> >>>>> vars from certain objects?
>> >> >>>>
>> >> >>>>
>> >> >>>> Eliot and the rest....Martin implemented this feature in
>> >> >>>> Fuel-MartinDias.258. For the moment, we decided to put
>> >> >>>> #fuelIgnoredInstanceVariableNames at class side.
>> >> >>>>
>> >> >>>> Behavior >> fuelIgnoredInstanceVariableNames
>> >> >>>>     "Indicates which variables have to be ignored during
>> >> >>>> serialization."
>> >> >>>>
>> >> >>>>     ^#()
>> >> >>>>
>> >> >>>>
>> >> >>>> MyClass class >> fuelIgnoredInstanceVariableNames
>> >> >>>>   ^ #('instVar1')
>> >> >>>>
>> >> >>>>
>> >> >>>> The impact in speed is nothing, so this is good. Now....we were
>> >> >>>> thinking
>> >> >>>> if it is common to need that 2 different instances of the same
>> class
>> >> >>>> need
>> >> >>>> different instVars to ignore. Is this common ? do you usually need
>> >> >>>> this ?
>> >> >>>> We checked in SIXX and it is at instance side. Java uses the
>> prefix
>> >> >>>> 'transient' so it is at class side...
>> >> >>>>
>> >> >>>> thanks
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> Mariano
>> >> >>>> http://marianopeck.wordpress.com
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >
>> >
>> > --
>> > best,
>> > Eliot
>> >
>>
>>
>

Re: [Pharo-project] Fuel - a fast object deployment tool

Reply via email to