Re: [Pharo-project] Fuel - a fast object deployment tool

Stéphane Ducasse Wed, 15 Jun 2011 11:52:08 -0700

On Jun 15, 2011, at 8:29 PM, Eliot Miranda wrote:

> Hi Martin & Mariano,
> 
>     regarding filtering.  Yesterday my colleague Yaron and I successfully 
> finished our port of Fuel to Newspeak and are successfully using it to save 
> and restore our data sets; thank you, its a cool framework.


I'm happy to see that we picked up good projects and guys: my consulting money 
is well spent I'm happy.

>  We had to implement two extensions, the first of which the ability to save 
> and restore Newspeak classes, which is complex because these are instantiated 
> classes inside instantiated Newspeak modules, not static Smalltalk classes in 
> the Smalltalk dictionary.  The second extension is the ability to map 
> specific objects to nil, to prune objects on the way out.  I want to discuss 
> this latter extension.
> 
> In our data set we have a set of references to objects that are logically not 
> persistent and hence not to be saved.  I'm sure that this will be a common 
> case.  The requirement is for the pickling system to prune certain objects, 
> typically by arranging that when an object graph is pickled, references to 
> the pruned objects are replaced by references to nil.  One way of doing this 
> is as described below, by specifiying per-class lists of instance variables 
> whose referents shoudl not be saved.  But this can be clumsy; there may be 
> references to objects one wants to prune from e.g. more than one class, in 
> which case one may have to provide multiple lists of the relevant inst vars;

yes I imagine that you can have different cases for the same class. 

> there may be references to objects one wants to prune from e.g. collections 
> (e.g. sets and dictionaries) in which case the instance variable list 
> approach just doesn't work.
> 
> Here are two more general schemes.  VFirst, most directly, Fuel could provide 
> two filters, implemented in the default mapper, or the core analyser.  One is 
> a set of classes whose instances are not to be saved.

yes like TranscriptStream

>  Any reference to an instance of a class in the toBePrunedClasses set is 
> saved as nil.  The other is a set of instances that are not to be saved, and 
> also any reference to an instance in the toBePruned set is saved as nil.  Why 
> have both?  It can be convenient and efficient to filter by class (in our 
> case we had many instances of a specific class, all of which should be 
> filtered, and finding them could be time consuming), but filtering by class 
> can be too inflexible, there may indeed be specific instances to exclude 
> (thing for example of part of the object graph that functions as a cache; 
> pruning the specific objects in the cache is the right thing to do; pruning 
> all instances of classes whose instances exist in the cache may prune too 
> much).

Yes I have the impression that we need both too. 
> 
> As an example here's how we implemented pruning.  Our system is called Glue, 
> and we start with a mapper for Glue objects, FLGlueMapper:
> 
> FLMapper subclass: #FLGlueMapper
>       instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster 
> modelClasses'
>       classVariableNames: ''
>       poolDictionaries: ''
>       category: 'Fuel-Core-Mappers'
> 
> It accepts newspeak objects and filters instances in the prunedObjectsClasses 
> set, and as a side-effect collects certain classes that we need in a manifest:
> 
> FLGlueMapper>>accepts: anObject
>       "Tells if the received object is handled by this analyzer.  We want to 
> hand-off
>        instantiated Newspeak classes to the newspeakClassesCluster, and we 
> want
>        to record other model classes.  We want to filter-out instances of any 
> class
>        in prunedObjectClasses."
>       ^anObject isBehavior
>               ifTrue:
>                       [(self isInstantiatedNewspeakClass: anObject) 
>                               ifTrue: [true]
>                               ifFalse:
>                                       [(anObject inheritsFrom: 
> GlueDataObject) ifTrue:
>                                               [modelClasses add: anObject].
>                                       false]]
>               ifFalse:
>                       [prunedObjectClasses includes: anObject class]
> 
> It prunes by mapping instances of the prunedObjectClasses to a special 
> cluster.  It can do this in visitObject: since any newspeak objects it is 
> accepting will be visited in its visitClassOrTrait: method (i.e. it's 
> implicit that all arguments to visitObjects: are instances of the 
> prunedObjectsClasses set).
> 
> FLGlueMapper>>visitObject: anObject
> 
>       analyzer 
>               mapAndTrace: anObject  
>               to: FLPrunedObjectsCluster instance
>               into: analyzer clustersWithBaselevelObjects
> 
> FLPrunedObjectsCluster is a specialization of the nil,true,false cluster that 
> maps its objects to nil:
> 
> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
>       instanceVariableNames: ''
>       classVariableNames: ''
>       poolDictionaries: ''
>       category: 'Fuel-Core-Clusters'
> 
> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
> 
>       super serialize: nil on: aWriteStream
> 
> 
> So this would generalize by the analyser having an e.g. FLPruningMapper as 
> the first mapper, and this having a prunedObjects and a priunedObjectClasses 
> set and going something like this:
> 
> FLPruningMapper>>accepts: anObject
>       ^(prunedObjects includes: anObject) or: [prunedObjectClasses includes: 
> anObject class]
> 
> FLPruningMapper >>visitObject: anObject
>       analyzer 
>               mapAndTrace: anObject  
>               to: FLPrunedObjectsCluster instance
>               into: analyzer clustersWithBaselevelObjects
> 
> and then one would provide accessors in FLSerialzer and/or FLAnalyser to add 
> objects and classes to the prunedObjects and prunedObjectClasses set.
> 
> For efficiency one could arrange that the FLPruningMapper was not added to 
> the sequence of mappers unless and until objects or classes were added to the 
> prunedObjects and prunedObjectClasses set.
> 
> I think both Yaron and I feel the Fuel framework is comprehensible and 
> flexible.  We enjoyed using it and while we took two passes at coming up with 
> the pruning scheme we liked (our first was based on not serializing specific 
> ins vars and was much more complex than our second, based on pruning 
> instances of specific classes) we got there quickly and will very little 
> frustration along the way.  Thank you very much.

No thank you for the feedback. We are writing two papers and it will help the 
master of Martin and probably helping PhD funding if we can say that people 
really use his work.

> Finally, a couple of things.  First, it may be more flexible to implement 
> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to 
> override certain parts of the mapping framework an implementation can access 
> the analyser to find existing clusters, e.g.
> 
> MyClass>>fuelClusterIn: anFLAnalyser
>       ^self shouldBeInASpecialCluster
>               ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
>               ifFalse: [super fuelClusterIn: anFLAnalyser]
> 
> This makes it easier to find a specific unique cluster to handle a group of 
> objects specially.
> 
> Lastly, the class-side cluster ids are a bit of a pain.  It would be nice to 
> know a) are these byte values or general integer values, i.e. can there be 
> more than 256 types of cluster?, and b) is there any meaning to the ids?  For 
> example, are clusters ordered by id, or is this just an integer tag?  Also, 
> some class-side code to assign an unused id would be nice.
> 
> You might think of virtualizing the id scheme.  For example, if FLCluster 
> maintained a weak array of all its subclasses then the id of a cluster could 
> be the index in the array, and the array could be cleaned up occasionally.  
> Then each fuel serialization could start with the list of cluster class names 
> and ids, so that specific values of ids are specific to a particular 
> serialization.

We will have to think about that
What is important is that Fuel should support change shape and evolution. 

> 
> again thanks for a great framework.
> 
> best,
> Eliot
> 
> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck 
> <[email protected]> wrote:
> 
> 
> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda <[email protected]> wrote:
> Hi Martin and Mariano,
> 
>     a couple of questions.  What's the right way to exclude certain objects 
> from the serialization?  Is there a way of excluding certain inst vars from 
> certain objects?
> 
> 
> 
> Eliot and the rest....Martin implemented this feature in Fuel-MartinDias.258. 
> For the moment, we decided to put #fuelIgnoredInstanceVariableNames at class 
> side.
> 
> Behavior >> fuelIgnoredInstanceVariableNames
>     "Indicates which variables have to be ignored during serialization."
> 
>     ^#()
> 
> 
> MyClass class >> fuelIgnoredInstanceVariableNames
>   ^ #('instVar1')
> 
> 
> The impact in speed is nothing, so this is good. Now....we were thinking if 
> it is common to need that 2 different instances of the same class need 
> different instVars to ignore. Is this common ? do you usually need this ?  We 
> checked in SIXX and it is at instance side. Java uses the prefix 'transient' 
> so it is at class side...
> 
> thanks
> 
> 
> -- 
> Mariano
> http://marianopeck.wordpress.com
> 
>

Re: [Pharo-project] Fuel - a fast object deployment tool

Reply via email to