Ok I see. Have you measured the collisions on a very big graph? What stats do you get?
Cheers Mike On 13 Dec 2011, at 08:37, Mariano Martinez Peck <[email protected]> wrote: > > > On Tue, Dec 13, 2011 at 8:43 AM, Michael Roberts <[email protected]> wrote: > Hi Mariano, when I read this thread I was a bit confused that you wanted an > IdentitySet that used #hash. From that statement it sounded like you just > wanted a Set. This would allow any object to define its own hash and > importantly what equality means with #=. So if you want to delegate that to > the object use a Set. > > > No, I cannot use a Set because I cannot have repetitions. This that I am > asking is what we do in Fuel serializer while traversing the object graph. > Each analyzed object is put in an IdentityDictionary (but the question is the > same for IdentitySet) as key. To avoid cycles, I need to put each object only > once. Since graphs can be very big, such dict could have lots of objects. > > However as the thread has gone on perhaps you want the identity relationship > but you just wanted a bigger identity hash space? > > Yes, exactly. I were thinking if there could be a way (maybe...that's why I > am asking) of improve its performance considering that I could have much more > objects than 2^13. In other words, I wanted to see if I could avoid > colisions. > > > IdentitySets are fast because they bypass any delegation. Once you have seen > the object in your traversal (common usage pattern) that's it. You grab it's > identity which is a pretty low level thing to do. > > Ok...it is a tradeoff. If I use #identityHash it is fast because there is no > delegation and it is almost an immediate primitive. But I gues it will be > slow if there are lots of colisions. Not using #identityHash but something > else maybe could decrease maybe the amount of colisions, but maybe with the > delegation it will gets slower. > > I will try with Levente idea of what it is done in SystemTracer: use as a > hash the identityHash of the object mixed with the identityHash of its class. > Maybe that decreases the colisions and at the same time I don't pay > delegation (#class is special bytecode, so nothing, and ok..there are 2 sends > to #identityhash but I don't thinnk it is that much). > > Anyway, thanks for the interesting post, I always learn :) > > > As for what happens with collections who knows. Depends. Relying on identity > set semantics for a collection is easy. Set semantics is not so. Remember > that both hash and equals are important to know if the set already contains > the element. Depending on the collection implementation both of these could > be composite in terms of the parts. Who knows where you end or how long it > takes. I.e. if it is a function of the size of the collection and further > collections are composite.... > > > yes... > > Cheers > Mike > > > > On Tuesday, December 13, 2011, Carlo <[email protected]> wrote: > > Hi Mariano > > I'm no expert either ;) > > Without having access to the exact code it would look like either you have > > a collection that references itself (which would break all collection > > implementations) or maybe the tests have just slowed down to the point > > where you think it's 'crashed'. > > Do you have anymore info or perhaps which methods you changed on > > IdentitySet? > > Cheers > > Carlo > > On 13 Dec 2011, at 1:57 AM, Mariano Martinez Peck wrote: > > > > > > On Tue, Dec 13, 2011 at 12:32 AM, Mariano Martinez Peck > > <[email protected]> wrote: > >> > >> > >> On Mon, Dec 12, 2011 at 1:56 PM, Carlo <[email protected]> wrote: > >>> > >>> Hi > >>> Wouldn't the fact that you use hash cause potential loops now? e.g. > >>> collection refers to another object that refers to first collection. --> > >>> aCollection>>hash references an item which causes this current > >>> collection's hash to be called again? > >> > >> Hi Carlo. I am still newbie with Collections but I think I am having > >> exactly that problem. During my tests, it loops in Collection >> #hash > >> when sending #hash to its elements. > >> Sorry, but I couldn't undertand what is the cause of the problem? why it > >> doesn't work while it does using #identityHash? could you elaborate? > >> > > > > Well, now I understood, and I understand also why it doesn't happen with > > #identityHash. But what happens then with regular Dictionaries using #hash? > > why it doesn't happen there? > > > >> > >> thanks > >> > >>> > >>> identityHash is deterministic in this case. > >>> Does this help? > >>> Cheers > >>> Carlo > >>> On 12 Dec 2011, at 10:58 AM, Mariano Martinez Peck wrote: > >>> Hi guys. I hope this is not a very stupid question. Background: in Fuel > >>> we have a IdentityDictionary where we put each of the objects we find > >>> while traversing the graph. We need to use IdentitySet because we cannot > >>> have repetitions (and to avoid loops) so we NEED to use #==. In such > >>> dictionary we put ALL objects of the graph, so it can be very big. Since > >>> IdentitySet uses #identityHash, it means it will be using those ONLY 12 > >>> bits in the object header. It means that we have 2^12 = 4096 different > >>> values. > >>> > >>> Question: having explained the previous, I wanted to be able to use > >>> #hash rather than #identityHash since several classes implement #hash and > >>> hence I thought that using #hash I could have less colisions and hence a > >>> better performance. I tried to make a subclass of IdentitySet that uses > >>> #hash rather than #identityHash but my image freezes. I also tried > >>> something like: > >>> > >>> set := PluggableSet new. > >>> set hashBlock: [ :elem | elem hash ]. > >>> set equalBlock: [ :a :b | a == b ]. > >>> > >>> But it doesn't work either. I works with simple tests in a workspace but > >>> when I run the full tests of Fuel, it enters in a loop in the method > >>> #hash of Collection.. > >>> > >>> Anyway, my question is, should that work? if not, what is the exact > >>> reason? > >>> > >>> Thanks in advance, > >>> > >>> -- > >>> Mariano > >>> http://marianopeck.wordpress.com > >>> > >>> > >> > >> > >> > >> -- > >> Mariano > >> http://marianopeck.wordpress.com > >> > > > > > > > > -- > > Mariano > > http://marianopeck.wordpress.com > > > > > > > > > > -- > Mariano > http://marianopeck.wordpress.com >
