2011/6/17 Eliot Miranda <[email protected]>: > > > On Fri, Jun 17, 2011 at 2:39 PM, Nicolas Cellier > <[email protected]> wrote: >> >> 2011/6/17 Eliot Miranda <[email protected]>: >> > >> > >> > On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <[email protected]> >> > wrote: >> >> >> >> Hi Eliot, >> >> I am very happy to read your mail. >> >> >> >> On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda >> >> <[email protected]> >> >> wrote: >> >>> >> >>> Hi Martin & Mariano, >> >>> regarding filtering. Yesterday my colleague Yaron and I >> >>> successfully >> >>> finished our port of Fuel to Newspeak and are successfully using it to >> >>> save >> >>> and restore our data sets; thank you, its a cool framework. We had to >> >>> implement two extensions, the first of which the ability to save and >> >>> restore >> >>> Newspeak classes, which is complex because these are instantiated >> >>> classes >> >>> inside instantiated Newspeak modules, not static Smalltalk classes in >> >>> the >> >>> Smalltalk dictionary. The second extension is the ability to map >> >>> specific >> >>> objects to nil, to prune objects on the way out. I want to discuss >> >>> this >> >>> latter extension. >> >>> In our data set we have a set of references to objects that are >> >>> logically >> >>> not persistent and hence not to be saved. I'm sure that this will be >> >>> a >> >>> common case. The requirement is for the pickling system to prune >> >>> certain >> >>> objects, typically by arranging that when an object graph is pickled, >> >>> references to the pruned objects are replaced by references to nil. >> >>> One way >> >>> of doing this is as described below, by specifiying per-class lists of >> >>> instance variables whose referents shoudl not be saved. But this can >> >>> be >> >>> clumsy; there may be references to objects one wants to prune from >> >>> e.g. more >> >>> than one class, in which case one may have to provide multiple lists >> >>> of the >> >>> relevant inst vars; there may be references to objects one wants to >> >>> prune >> >>> from e.g. collections (e.g. sets and dictionaries) in which case the >> >>> instance variable list approach just doesn't work. >> >>> Here are two more general schemes. VFirst, most directly, Fuel could >> >>> provide two filters, implemented in the default mapper, or the core >> >>> analyser. One is a set of classes whose instances are not to be >> >>> saved. Any >> >>> reference to an instance of a class in the toBePrunedClasses set is >> >>> saved as >> >>> nil. The other is a set of instances that are not to be saved, and >> >>> also any >> >>> reference to an instance in the toBePruned set is saved as nil. Why >> >>> have >> >>> both? It can be convenient and efficient to filter by class (in our >> >>> case we >> >>> had many instances of a specific class, all of which should be >> >>> filtered, and >> >>> finding them could be time consuming), but filtering by class can be >> >>> too >> >>> inflexible, there may indeed be specific instances to exclude (thing >> >>> for >> >>> example of part of the object graph that functions as a cache; pruning >> >>> the >> >>> specific objects in the cache is the right thing to do; pruning all >> >>> instances of classes whose instances exist in the cache may prune too >> >>> much). >> >>> As an example here's how we implemented pruning. Our system is called >> >>> Glue, and we start with a mapper for Glue objects, FLGlueMapper: >> >>> FLMapper subclass: #FLGlueMapper >> >>> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster >> >>> modelClasses' >> >>> classVariableNames: '' >> >>> poolDictionaries: '' >> >>> category: 'Fuel-Core-Mappers' >> >>> It accepts newspeak objects and filters instances in the >> >>> prunedObjectsClasses set, and as a side-effect collects certain >> >>> classes that >> >>> we need in a manifest: >> >>> FLGlueMapper>>accepts: anObject >> >>> "Tells if the received object is handled by this analyzer. We want to >> >>> hand-off >> >>> instantiated Newspeak classes to the newspeakClassesCluster, and we >> >>> want >> >>> to record other model classes. We want to filter-out instances of any >> >>> class >> >>> in prunedObjectClasses." >> >>> ^anObject isBehavior >> >>> ifTrue: >> >>> [(self isInstantiatedNewspeakClass: anObject) >> >>> ifTrue: [true] >> >>> ifFalse: >> >>> [(anObject inheritsFrom: GlueDataObject) ifTrue: >> >>> [modelClasses add: anObject]. >> >>> false]] >> >>> ifFalse: >> >>> [prunedObjectClasses includes: anObject class] >> >>> It prunes by mapping instances of the prunedObjectClasses to a special >> >>> cluster. It can do this in visitObject: since any newspeak objects it >> >>> is >> >>> accepting will be visited in its visitClassOrTrait: method (i.e. it's >> >>> implicit that all arguments to visitObjects: are instances of the >> >>> prunedObjectsClasses set). >> >>> FLGlueMapper>>visitObject: anObject >> >>> analyzer >> >>> mapAndTrace: anObject >> >>> to: FLPrunedObjectsCluster instance >> >>> into: analyzer clustersWithBaselevelObjects >> >>> FLPrunedObjectsCluster is a specialization of the nil,true,false >> >>> cluster >> >>> that maps its objects to nil: >> >>> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster >> >>> instanceVariableNames: '' >> >>> classVariableNames: '' >> >>> poolDictionaries: '' >> >>> category: 'Fuel-Core-Clusters' >> >>> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream >> >>> super serialize: nil on: aWriteStream >> >>> >> >>> So this would generalize by the analyser having an e.g. >> >>> FLPruningMapper >> >>> as the first mapper, and this having a prunedObjects and a >> >>> priunedObjectClasses set and going something like this: >> >>> FLPruningMapper>>accepts: anObject >> >>> ^(prunedObjects includes: anObject) or: [prunedObjectClasses includes: >> >>> anObject class] >> >>> FLPruningMapper >>visitObject: anObject >> >>> analyzer >> >>> mapAndTrace: anObject >> >>> to: FLPrunedObjectsCluster instance >> >>> into: analyzer clustersWithBaselevelObjects >> >>> and then one would provide accessors in FLSerialzer and/or FLAnalyser >> >>> to >> >>> add objects and classes to the prunedObjects and prunedObjectClasses >> >>> set. >> >>> For efficiency one could arrange that the FLPruningMapper was not >> >>> added >> >>> to the sequence of mappers unless and until objects or classes were >> >>> added >> >>> to the prunedObjects and prunedObjectClasses set. >> >> >> >> Excellent. I love the botanical metaphor of pruning! Of course we can >> >> include FLPruningMapper and FLPrunedObjectsCluster in Fuel. >> >> >> >> We are also interested in pruning objects but not necessarily replacing >> >> them by nil, but for another user defined objects. For example proxies. >> >> We >> >> can extend the pruning stuff for doing that. >> > >> > That was an idea Yaron came up with. That instead of >> > using fuelIgnoredInstanceVariableNames one uses e.g. >> > Object>>objectToSerialize >> > ^self >> > and then if one wants to prune specific inst vars in MyClass one >> > implements >> > MyClass>>objectToSerialize >> > ^self shallowCopy prepareForSerialization >> >> Hi Eliot, >> >> I'm not convinced by the shallowCopy solution, except for the simple >> structures. >> If object graph is complex (have share nodes, loops, ...) then you >> gonna end up in a replication problem equivalent to the one Fuel is >> trying to solve. > > The assumption is that the analyser would create a maximum of one proxy per > object in the graph (default, no proxy) and that it would map objects with > proxies to their proxies. So if proxies only nilled out inst vars I don't > see a problem. What's attractive about this is that it provides a general > solution to a couple of problems, a) how to replace a class of objects by > some substitute (e.g. nil), b) how to prune state that needn't be saved. It > is also conceptually simple; one just creates a proxy instance; no defining > metadata, such as inst var names, and hence the code is always up-to-date > (e.g. a class redefine won't automatically uncover renamed inst vars in > serialization metadata).
Ah, OK, it occurs after the graph analysis, which I did not catch at first read. Now I understand better. Nicolas >> >> Nicolas >> >> > MyClass>>prepareForSerialization >> > instVarIDontWantToSerialize := nil. >> > ^self >> > and for objects one doesn't want to serlalize one implements >> > MyNotToBeSerializedClass>>objectToSerialize >> > ^nil >> > So its more general. But I would pass the analyser in as an argument, >> > which >> > would allow things like >> > MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser >> > ^(anFLAnalyser shouldPrune: self) >> > ifFalse: [self] >> > ifTrue: [nil] >> > which would of course be the default in Object: >> > Object>>objectToSerializeIn: anFLAnalyser >> > ^(anFLAnalyser shouldPrune: self) ifFalse:: [self] >> > >> >> >> >> >> >>> >> >>> I think both Yaron and I feel the Fuel framework is comprehensible and >> >>> flexible. We enjoyed using it and while we took two passes at coming >> >>> up >> >>> with the pruning scheme we liked (our first was based on not >> >>> serializing >> >>> specific ins vars and was much more complex than our second, based on >> >>> pruning instances of specific classes) we got there quickly and will >> >>> very >> >>> little frustration along the way. Thank you very much. >> >> >> >> :-) thank you! >> >> >> >>> >> >>> Finally, a couple of things. First, it may be more flexible to >> >>> implement >> >>> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to >> >>> override certain parts of the mapping framework an implementation can >> >>> access >> >>> the analyser to find existing clusters, e.g. >> >>> MyClass>>fuelClusterIn: anFLAnalyser >> >>> ^self shouldBeInASpecialCluster >> >>> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id] >> >>> ifFalse: [super fuelClusterIn: anFLAnalyser] >> >>> This makes it easier to find a specific unique cluster to handle a >> >>> group >> >>> of objects specially. >> >> >> >> I can't imagine a concrete example but I see that it is more >> >> flexible... >> >> the cluster obtained via double dispatch can be anything polymorphic >> >> with >> >> MySpecialCluster... that's the point? >> > >> > To be honest I'm not sure. But passing in the analyser in things like >> > fuelCluster or objectToSerialize is I think a good idea as it provides a >> > convenient communication path which in turn provides considerable >> > flexibility. >> >> >> >> >> >>> >> >>> Lastly, the class-side cluster ids are a bit of a pain. It would be >> >>> nice >> >>> to know a) are these byte values or general integer values, i.e. can >> >>> there >> >>> be more than 256 types of cluster?, and b) is there any meaning to the >> >>> ids? >> >>> For example, are clusters ordered by id, or is this just an integer >> >>> tag? >> >>> Also, some class-side code to assign an unused id would be nice. >> >>> You might think of virtualizing the id scheme. For example, if >> >>> FLCluster >> >>> maintained a weak array of all its subclasses then the id of a cluster >> >>> could >> >>> be the index in the array, and the array could be cleaned up >> >>> occasionally. >> >>> Then each fuel serialization could start with the list of cluster >> >>> class >> >>> names and ids, so that specific values of ids are specific to a >> >>> particular >> >>> serialization. >> >> >> >> I do agree, these ids are an heritage from the first prototypes of >> >> fuel, >> >> they should be revised. a) yes, it is encoded in only one byte; b) just >> >> an >> >> integer tag, the only purpose of the id was for decoding fast: read a >> >> byte >> >> and then look in a dictionary for the corresponding cluster instance. >> >> We >> >> could even store the cluster class name but that's inefficient. >> > >> > Yes, but how inefficient? What's the size of all the cluster names? >> > FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size + 1] >> > 670 >> > >> > So you'd add less than a kilobyte to the size of each serialization and >> > get >> > complete freedom from ids. Something to think about. >> >> >> >> Virtualizing the id scheme is a good idea. Much more elegant and >> >> extensible. The current mechanism not only limits the number of >> >> possible >> >> clusters, but also "user defined" extensions can collide, for example >> >> if >> >> your Glue cluster id is the same of the Moose cluster id. >> >> >> >> I added an issue in our tracker. >> >> >> >> If it makes sense, maybe the weak array you suggest can be also used to >> >> avoid instantiating lots of FLObjectCluster like we are doing in >> >> Object: >> >> >> >> fuelCluster >> >> ^ self class isVariable >> >> ifTrue: [ FLVariableObjectCluster for: self class ] >> >> ifFalse: [ FLFixedObjectCluster for: self class ] >> >> >> >> the second time you send fuelCluster to an object, it can reuse the >> >> cluster instance. >> > >> > Right. I think that's important, and is one reason why I think passing >> > in >> > the analyser is important, because it allows certain objects to discover >> > existing clusters in the analyzer and join them if they want to, instead >> > of >> > having to invent and maintain their own cluster uniquing solution >> > . >> >>> >> >>> again thanks for a great framework. >> >> >> >> Thanks for your words and the feedback. Is Glue published somewhere? >> > >> > No, and its extremely proprietary :) Newspeak however is available and >> > we >> > may end up maintaining a port of Fuel for Newspeak. >> > best regards, >> > Eliot >> > >> >> >> >> regards >> >> Martin >> >> >> >> >> >>> >> >>> best, >> >>> Eliot >> >> >> >> >> >>> >> >>> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck >> >>> <[email protected]> wrote: >> >>>> >> >>>> >> >>>> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda >> >>>> <[email protected]> >> >>>> wrote: >> >>>>> >> >>>>> Hi Martin and Mariano, >> >>>>> a couple of questions. What's the right way to exclude certain >> >>>>> objects from the serialization? Is there a way of excluding certain >> >>>>> inst >> >>>>> vars from certain objects? >> >>>> >> >>>> >> >>>> Eliot and the rest....Martin implemented this feature in >> >>>> Fuel-MartinDias.258. For the moment, we decided to put >> >>>> #fuelIgnoredInstanceVariableNames at class side. >> >>>> >> >>>> Behavior >> fuelIgnoredInstanceVariableNames >> >>>> "Indicates which variables have to be ignored during >> >>>> serialization." >> >>>> >> >>>> ^#() >> >>>> >> >>>> >> >>>> MyClass class >> fuelIgnoredInstanceVariableNames >> >>>> ^ #('instVar1') >> >>>> >> >>>> >> >>>> The impact in speed is nothing, so this is good. Now....we were >> >>>> thinking >> >>>> if it is common to need that 2 different instances of the same class >> >>>> need >> >>>> different instVars to ignore. Is this common ? do you usually need >> >>>> this ? >> >>>> We checked in SIXX and it is at instance side. Java uses the prefix >> >>>> 'transient' so it is at class side... >> >>>> >> >>>> thanks >> >>>> >> >>>> >> >>>> -- >> >>>> Mariano >> >>>> http://marianopeck.wordpress.com >> >>>> >> >>> >> >> >> > >> > > > > -- > best, > Eliot >
