Well...it is much slower :( it seems that the cost of (aKey identityHash + ( aKey mareaClass identityHash bitShift: 12) + (aKey basicSize bitShift: 24) is bigger than the colisions. Anyway, thanks for the nice thread. I learned.
Cheers On Tue, Dec 13, 2011 at 8:51 PM, Mariano Martinez Peck < marianop...@gmail.com> wrote: > > > On Tue, Dec 13, 2011 at 8:44 PM, Eliot Miranda <eliot.mira...@gmail.com>wrote: > >> Hi Mariano, >> >> On Tue, Dec 13, 2011 at 12:37 AM, Mariano Martinez Peck < >> marianop...@gmail.com> wrote: >> >>> >>> >>> On Tue, Dec 13, 2011 at 8:43 AM, Michael Roberts <m...@mjr104.co.uk>wrote: >>> >>>> Hi Mariano, when I read this thread I was a bit confused that you >>>> wanted an IdentitySet that used #hash. From that statement it sounded like >>>> you just wanted a Set. This would allow any object to define its own hash >>>> and importantly what equality means with #=. So if you want to delegate >>>> that to the object use a Set. >>>> >>>> >>> No, I cannot use a Set because I cannot have repetitions. This that I am >>> asking is what we do in Fuel serializer while traversing the object graph. >>> Each analyzed object is put in an IdentityDictionary (but the question is >>> the same for IdentitySet) as key. To avoid cycles, I need to put each >>> object only once. Since graphs can be very big, such dict could have lots >>> of objects. >>> >>> >>>> However as the thread has gone on perhaps you want the identity >>>> relationship but you just wanted a bigger identity hash space? >>>> >>> >>> Yes, exactly. I were thinking if there could be a way (maybe...that's >>> why I am asking) of improve its performance considering that I could have >>> much more objects than 2^13. In other words, I wanted to see if I could >>> avoid colisions. >>> >> >> You can assume that certain properties of objects will not change during >> serialization, for example the class of objects, the basic size of objects. >> So you can construct a valid extended identity hash from these properties. >> For example >> >> fuelSerializationHash >> ^self identityHash + (self class identityHash bitShift: 12) + (self >> basicSize bitShift: 24) >> >> > Thanks!!! that's exactly what I wanted to try :) > > >> In general this idea may not work because of meta-primitives like >> changeClassTo:, which would change the hash. But in Fuel's case I think >> it's safe to assume that objects won't change class or size during >> serialization. >> > > Exactly :) Moreoever, if objects change their state or shape, or, in > other words, in the graph changes while we are in the middle of the > serialization, we will be screw anyway....whether the hash has changed or > not :) > > I will see what is worst, if the colisions or having to send 2 times > #identityHash and 1 #basicSize. > > Thanks! > > >> >> HTH >> >> >> >>> >>> >>>> >>>> IdentitySets are fast because they bypass any delegation. Once you have >>>> seen the object in your traversal (common usage pattern) that's it. You >>>> grab it's identity which is a pretty low level thing to do. >>>> >>> >>> Ok...it is a tradeoff. If I use #identityHash it is fast because there >>> is no delegation and it is almost an immediate primitive. But I gues it >>> will be slow if there are lots of colisions. Not using #identityHash but >>> something else maybe could decrease maybe the amount of colisions, but >>> maybe with the delegation it will gets slower. >>> >>> I will try with Levente idea of what it is done in SystemTracer: use as >>> a hash the identityHash of the object mixed with the identityHash of its >>> class. Maybe that decreases the colisions and at the same time I don't pay >>> delegation (#class is special bytecode, so nothing, and ok..there are 2 >>> sends to #identityhash but I don't thinnk it is that much). >>> >>> Anyway, thanks for the interesting post, I always learn :) >>> >>> >>>> >>>> As for what happens with collections who knows. Depends. Relying on >>>> identity set semantics for a collection is easy. Set semantics is not so. >>>> Remember that both hash and equals are important to know if the set already >>>> contains the element. Depending on the collection implementation both of >>>> these could be composite in terms of the parts. Who knows where you end or >>>> how long it takes. I.e. if it is a function of the size of the collection >>>> and further collections are composite.... >>>> >>>> >>> yes... >>> >>> >>>> Cheers >>>> Mike >>>> >>>> >>>> >>>> On Tuesday, December 13, 2011, Carlo <snoob...@yahoo.ie> wrote: >>>> > Hi Mariano >>>> > I'm no expert either ;) >>>> > Without having access to the exact code it would look like either you >>>> have a collection that references itself (which would break all collection >>>> implementations) or maybe the tests have just slowed down to the point >>>> where you think it's 'crashed'. >>>> > Do you have anymore info or perhaps which methods you changed on >>>> IdentitySet? >>>> > Cheers >>>> > Carlo >>>> > On 13 Dec 2011, at 1:57 AM, Mariano Martinez Peck wrote: >>>> > >>>> > >>>> > On Tue, Dec 13, 2011 at 12:32 AM, Mariano Martinez Peck < >>>> marianop...@gmail.com> wrote: >>>> >> >>>> >> >>>> >> On Mon, Dec 12, 2011 at 1:56 PM, Carlo <snoob...@yahoo.ie> wrote: >>>> >>> >>>> >>> Hi >>>> >>> Wouldn't the fact that you use hash cause potential loops now? e.g. >>>> collection refers to another object that refers to first collection. --> >>>> aCollection>>hash references an item which causes this current collection's >>>> hash to be called again? >>>> >> >>>> >> Hi Carlo. I am still newbie with Collections but I think I am having >>>> exactly that problem. During my tests, it loops in Collection >> #hash >>>> when sending #hash to its elements. >>>> >> Sorry, but I couldn't undertand what is the cause of the problem? >>>> why it doesn't work while it does using #identityHash? could you >>>> elaborate? >>>> >> >>>> > >>>> > Well, now I understood, and I understand also why it doesn't happen >>>> with #identityHash. But what happens then with regular Dictionaries using >>>> #hash? why it doesn't happen there? >>>> > >>>> >> >>>> >> thanks >>>> >> >>>> >>> >>>> >>> identityHash is deterministic in this case. >>>> >>> Does this help? >>>> >>> Cheers >>>> >>> Carlo >>>> >>> On 12 Dec 2011, at 10:58 AM, Mariano Martinez Peck wrote: >>>> >>> Hi guys. I hope this is not a very stupid question. Background: in >>>> Fuel we have a IdentityDictionary where we put each of the objects we find >>>> while traversing the graph. We need to use IdentitySet because we cannot >>>> have repetitions (and to avoid loops) so we NEED to use #==. In such >>>> dictionary we put ALL objects of the graph, so it can be very big. Since >>>> IdentitySet uses #identityHash, it means it will be using those ONLY 12 >>>> bits in the object header. It means that we have 2^12 = 4096 different >>>> values. >>>> >>> >>>> >>> Question: having explained the previous, I wanted to be able to >>>> use #hash rather than #identityHash since several classes implement #hash >>>> and hence I thought that using #hash I could have less colisions and hence >>>> a better performance. I tried to make a subclass of IdentitySet that uses >>>> #hash rather than #identityHash but my image freezes. I also tried >>>> something like: >>>> >>> >>>> >>> set := PluggableSet new. >>>> >>> set hashBlock: [ :elem | elem hash ]. >>>> >>> set equalBlock: [ :a :b | a == b ]. >>>> >>> >>>> >>> But it doesn't work either. I works with simple tests in a >>>> workspace but when I run the full tests of Fuel, it enters in a loop in the >>>> method #hash of Collection.. >>>> >>> >>>> >>> Anyway, my question is, should that work? if not, what is the exact >>>> reason? >>>> >>> >>>> >>> Thanks in advance, >>>> >>> >>>> >>> -- >>>> >>> Mariano >>>> >>> http://marianopeck.wordpress.com >>>> >>> >>>> >>> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Mariano >>>> >> http://marianopeck.wordpress.com >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > Mariano >>>> > http://marianopeck.wordpress.com >>>> > >>>> > >>>> > >>>> >>> >>> >>> >>> -- >>> Mariano >>> http://marianopeck.wordpress.com >>> >>> >> >> >> -- >> best, >> Eliot >> >> > > > -- > Mariano > http://marianopeck.wordpress.com > > -- Mariano http://marianopeck.wordpress.com