Well...it is much slower :(  it seems that the cost of  (aKey identityHash
+ ( aKey mareaClass identityHash bitShift: 12) + (aKey basicSize bitShift:
24)
is bigger than the colisions.
Anyway, thanks for the nice thread. I learned.

Cheers

On Tue, Dec 13, 2011 at 8:51 PM, Mariano Martinez Peck <
marianop...@gmail.com> wrote:

>
>
> On Tue, Dec 13, 2011 at 8:44 PM, Eliot Miranda <eliot.mira...@gmail.com>wrote:
>
>> Hi Mariano,
>>
>> On Tue, Dec 13, 2011 at 12:37 AM, Mariano Martinez Peck <
>> marianop...@gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Dec 13, 2011 at 8:43 AM, Michael Roberts <m...@mjr104.co.uk>wrote:
>>>
>>>> Hi Mariano, when I read this thread I was a bit confused that you
>>>> wanted an IdentitySet that used #hash. From that statement it sounded like
>>>> you just wanted a Set. This would allow any object to define its own hash
>>>> and importantly what equality means with #=. So if you want to delegate
>>>> that to the object use a Set.
>>>>
>>>>
>>> No, I cannot use a Set because I cannot have repetitions. This that I am
>>> asking is what we do in Fuel serializer while traversing the object graph.
>>> Each analyzed object is put in an IdentityDictionary (but the question is
>>> the same for IdentitySet) as key. To avoid cycles, I need to put each
>>> object only once. Since graphs can be very big, such dict could have lots
>>> of objects.
>>>
>>>
>>>> However as the thread has gone on perhaps you want the identity
>>>> relationship but you just wanted a bigger identity hash space?
>>>>
>>>
>>> Yes, exactly. I were thinking if there could be a way (maybe...that's
>>> why I am asking) of improve its performance considering that I could have
>>> much more objects than 2^13. In other words, I wanted to see if I could
>>> avoid colisions.
>>>
>>
>> You can assume that certain properties of objects will not change during
>> serialization, for example the class of objects, the basic size of objects.
>>  So you can construct a valid extended identity hash from these properties.
>>  For example
>>
>> fuelSerializationHash
>>     ^self identityHash + (self class identityHash bitShift: 12) + (self
>> basicSize bitShift: 24)
>>
>>
> Thanks!!! that's exactly what I wanted to try :)
>
>
>> In general this idea may not work because of meta-primitives like
>> changeClassTo:, which would change the hash.  But in Fuel's case I think
>> it's safe to assume that objects won't change class or size during
>> serialization.
>>
>
> Exactly :)  Moreoever, if objects change their state or shape, or, in
> other words, in the graph changes while we are in the middle of the
> serialization, we will be screw anyway....whether the hash has changed or
> not :)
>
> I will see what is worst, if the colisions or having to send 2 times
> #identityHash and 1 #basicSize.
>
> Thanks!
>
>
>>
>> HTH
>>
>>
>>
>>>
>>>
>>>>
>>>> IdentitySets are fast because they bypass any delegation. Once you have
>>>> seen the object in your traversal (common usage pattern) that's it. You
>>>> grab it's identity which is a pretty low level thing to do.
>>>>
>>>
>>> Ok...it is a tradeoff. If I use #identityHash it is fast because there
>>> is no delegation and it is almost an immediate primitive. But I gues it
>>> will be slow if there are lots of colisions. Not using #identityHash but
>>> something else maybe could decrease maybe the amount of colisions, but
>>> maybe with the delegation it will gets slower.
>>>
>>> I will try with Levente idea of what it is done in SystemTracer: use as
>>> a hash the identityHash of the object mixed with the identityHash of its
>>> class. Maybe that decreases the colisions and at the same time I don't pay
>>> delegation (#class is special bytecode, so nothing, and ok..there are 2
>>> sends to #identityhash but I don't thinnk it is that much).
>>>
>>> Anyway, thanks for the interesting post, I always learn :)
>>>
>>>
>>>>
>>>> As for what happens with collections who knows. Depends. Relying on
>>>> identity set semantics for a collection is easy. Set semantics is not so.
>>>> Remember that both hash and equals are important to know if the set already
>>>> contains the element. Depending on the collection implementation both of
>>>> these could be composite in terms of the parts. Who knows where you end or
>>>> how long it takes. I.e. if it is a function of the size of the collection
>>>> and further collections are composite....
>>>>
>>>>
>>> yes...
>>>
>>>
>>>> Cheers
>>>> Mike
>>>>
>>>>
>>>>
>>>> On Tuesday, December 13, 2011, Carlo <snoob...@yahoo.ie> wrote:
>>>> > Hi Mariano
>>>> > I'm no expert either ;)
>>>> > Without having access to the exact code it would look like either you
>>>> have a collection that references itself (which would break all collection
>>>> implementations) or maybe the tests have just slowed down to the point
>>>> where you think it's 'crashed'.
>>>> > Do you have anymore info or perhaps which methods you changed on
>>>> IdentitySet?
>>>> > Cheers
>>>> > Carlo
>>>> > On 13 Dec 2011, at 1:57 AM, Mariano Martinez Peck wrote:
>>>> >
>>>> >
>>>> > On Tue, Dec 13, 2011 at 12:32 AM, Mariano Martinez Peck <
>>>> marianop...@gmail.com> wrote:
>>>> >>
>>>> >>
>>>> >> On Mon, Dec 12, 2011 at 1:56 PM, Carlo <snoob...@yahoo.ie> wrote:
>>>> >>>
>>>> >>> Hi
>>>> >>> Wouldn't the fact that you use hash cause potential loops now? e.g.
>>>> collection refers to another object that refers to first collection. -->
>>>> aCollection>>hash references an item which causes this current collection's
>>>> hash to be called again?
>>>> >>
>>>> >> Hi Carlo. I am still newbie with Collections but I think I am having
>>>> exactly that problem. During my tests, it loops in Collection >> #hash
>>>> when sending #hash to its elements.
>>>> >> Sorry, but I couldn't undertand what is the cause of the problem?
>>>> why it doesn't work while it does using #identityHash?  could you 
>>>> elaborate?
>>>> >>
>>>> >
>>>> > Well, now I understood, and I understand also why it doesn't happen
>>>> with #identityHash. But what happens then with regular Dictionaries using
>>>> #hash? why it doesn't happen there?
>>>> >
>>>> >>
>>>> >> thanks
>>>> >>
>>>> >>>
>>>> >>> identityHash is deterministic in this case.
>>>> >>> Does this help?
>>>> >>> Cheers
>>>> >>> Carlo
>>>> >>> On 12 Dec 2011, at 10:58 AM, Mariano Martinez Peck wrote:
>>>> >>> Hi guys. I hope this is not a very stupid question. Background: in
>>>> Fuel we have a IdentityDictionary where we put each of the objects we find
>>>> while traversing the graph. We need to use IdentitySet because we cannot
>>>> have repetitions (and to avoid loops) so we NEED to use #==. In such
>>>> dictionary we put ALL objects of the graph, so it can be very big. Since
>>>> IdentitySet uses #identityHash, it means it will be using those ONLY 12
>>>> bits in the object header. It means that we have 2^12 = 4096  different
>>>> values.
>>>> >>>
>>>> >>> Question:  having explained the previous, I wanted to be able to
>>>> use #hash rather than #identityHash since several classes implement #hash
>>>> and hence I thought that using #hash I could have less colisions and hence
>>>> a better performance. I tried to make a subclass of IdentitySet that uses
>>>> #hash rather than #identityHash but my image freezes. I also tried
>>>> something like:
>>>> >>>
>>>> >>> set := PluggableSet new.
>>>> >>>     set hashBlock: [ :elem | elem hash ].
>>>> >>>     set equalBlock: [ :a :b | a == b ].
>>>> >>>
>>>> >>> But it doesn't work either. I works with simple tests in a
>>>> workspace but when I run the full tests of Fuel, it enters in a loop in the
>>>> method #hash of Collection..
>>>> >>>
>>>> >>> Anyway, my question is, should that work? if not, what is the exact
>>>> reason?
>>>> >>>
>>>> >>> Thanks in advance,
>>>> >>>
>>>> >>> --
>>>> >>> Mariano
>>>> >>> http://marianopeck.wordpress.com
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Mariano
>>>> >> http://marianopeck.wordpress.com
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Mariano
>>>> > http://marianopeck.wordpress.com
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Mariano
>>> http://marianopeck.wordpress.com
>>>
>>>
>>
>>
>> --
>> best,
>> Eliot
>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>


-- 
Mariano
http://marianopeck.wordpress.com

Reply via email to