Ok I see. Have you measured the collisions on a very big graph? What stats do 
you get?

Cheers Mike

On 13 Dec 2011, at 08:37, Mariano Martinez Peck <[email protected]> wrote:

> 
> 
> On Tue, Dec 13, 2011 at 8:43 AM, Michael Roberts <[email protected]> wrote:
> Hi Mariano, when I read this thread I was a bit confused that you wanted an 
> IdentitySet that used #hash. From that statement it sounded like you just 
> wanted a Set. This would allow any object to define its own hash and 
> importantly what equality means with #=. So if you want to delegate that to 
> the object use a Set. 
> 
> 
> No, I cannot use a Set because I cannot have repetitions. This that I am 
> asking is what we do in Fuel serializer while traversing the object graph. 
> Each analyzed object is put in an IdentityDictionary (but the question is the 
> same for IdentitySet) as key. To avoid cycles, I need to put each object only 
> once. Since graphs can be very big, such dict could have lots of objects. 
>  
> However as the thread has gone on perhaps you want the identity relationship 
> but you just wanted a bigger identity hash space?
> 
> Yes, exactly. I were thinking if there could be a way (maybe...that's why I 
> am asking) of improve its performance considering that I could have much more 
> objects than 2^13. In other words, I wanted to see if I could avoid 
> colisions. 
>  
> 
> IdentitySets are fast because they bypass any delegation. Once you have seen 
> the object in your traversal (common usage pattern) that's it. You grab it's 
> identity which is a pretty low level thing to do. 
> 
> Ok...it is a tradeoff. If I use #identityHash it is fast because there is no 
> delegation and it is almost an immediate primitive. But I gues it will be 
> slow if there are lots of colisions. Not using #identityHash but something 
> else maybe could decrease maybe the amount of colisions, but maybe with the 
> delegation it will gets slower.
> 
> I will try with Levente idea of what it is done in SystemTracer: use as a 
> hash the identityHash of the object mixed with the identityHash of its class. 
> Maybe that decreases the colisions and at the same time I don't pay 
> delegation (#class is special bytecode, so nothing, and ok..there are 2 sends 
> to #identityhash but I don't thinnk it is that much).
> 
> Anyway, thanks for the interesting post, I always learn :)
>  
> 
> As for what happens with collections who knows. Depends. Relying on identity 
> set semantics for a collection is easy. Set semantics is not so. Remember 
> that both hash and equals are important to know if the set already contains 
> the element. Depending on the collection implementation both of these could 
> be composite in terms of the parts. Who knows where you end or how long it 
> takes. I.e. if it is a function of the size of the collection and further 
> collections are composite....
> 
> 
> yes...
>  
> Cheers
> Mike
> 
> 
> 
> On Tuesday, December 13, 2011, Carlo <[email protected]> wrote:
> > Hi Mariano
> > I'm no expert either ;)
> > Without having access to the exact code it would look like either you have 
> > a collection that references itself (which would break all collection 
> > implementations) or maybe the tests have just slowed down to the point 
> > where you think it's 'crashed'.
> > Do you have anymore info or perhaps which methods you changed on 
> > IdentitySet?
> > Cheers
> > Carlo
> > On 13 Dec 2011, at 1:57 AM, Mariano Martinez Peck wrote:
> >
> >
> > On Tue, Dec 13, 2011 at 12:32 AM, Mariano Martinez Peck 
> > <[email protected]> wrote:
> >>
> >>
> >> On Mon, Dec 12, 2011 at 1:56 PM, Carlo <[email protected]> wrote:
> >>>
> >>> Hi
> >>> Wouldn't the fact that you use hash cause potential loops now? e.g. 
> >>> collection refers to another object that refers to first collection. --> 
> >>> aCollection>>hash references an item which causes this current 
> >>> collection's hash to be called again?
> >>
> >> Hi Carlo. I am still newbie with Collections but I think I am having 
> >> exactly that problem. During my tests, it loops in Collection >> #hash  
> >> when sending #hash to its elements.
> >> Sorry, but I couldn't undertand what is the cause of the problem?  why it 
> >> doesn't work while it does using #identityHash?  could you elaborate?
> >>
> >
> > Well, now I understood, and I understand also why it doesn't happen with 
> > #identityHash. But what happens then with regular Dictionaries using #hash? 
> > why it doesn't happen there?
> >  
> >>
> >> thanks
> >>  
> >>>
> >>> identityHash is deterministic in this case.
> >>> Does this help?
> >>> Cheers
> >>> Carlo
> >>> On 12 Dec 2011, at 10:58 AM, Mariano Martinez Peck wrote:
> >>> Hi guys. I hope this is not a very stupid question. Background: in Fuel 
> >>> we have a IdentityDictionary where we put each of the objects we find 
> >>> while traversing the graph. We need to use IdentitySet because we cannot 
> >>> have repetitions (and to avoid loops) so we NEED to use #==. In such 
> >>> dictionary we put ALL objects of the graph, so it can be very big. Since 
> >>> IdentitySet uses #identityHash, it means it will be using those ONLY 12 
> >>> bits in the object header. It means that we have 2^12 = 4096  different 
> >>> values. 
> >>>
> >>> Question:  having explained the previous, I wanted to be able to use 
> >>> #hash rather than #identityHash since several classes implement #hash and 
> >>> hence I thought that using #hash I could have less colisions and hence a 
> >>> better performance. I tried to make a subclass of IdentitySet that uses 
> >>> #hash rather than #identityHash but my image freezes. I also tried 
> >>> something like:
> >>>
> >>> set := PluggableSet new.
> >>>     set hashBlock: [ :elem | elem hash ].
> >>>     set equalBlock: [ :a :b | a == b ].
> >>>
> >>> But it doesn't work either. I works with simple tests in a workspace but 
> >>> when I run the full tests of Fuel, it enters in a loop in the method 
> >>> #hash of Collection..
> >>>
> >>> Anyway, my question is, should that work? if not, what is the exact 
> >>> reason?
> >>>
> >>> Thanks in advance,
> >>>
> >>> --
> >>> Mariano
> >>> http://marianopeck.wordpress.com
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Mariano
> >> http://marianopeck.wordpress.com
> >>
> >
> >
> >
> > --
> > Mariano
> > http://marianopeck.wordpress.com
> >
> >
> >
> 
> 
> 
> -- 
> Mariano
> http://marianopeck.wordpress.com
> 

Reply via email to