On 8 Říjen 2013, 6:23, Atri Sharma wrote:
> Hi Tomas,
>
>
>>> Consider the aspects associated with open addressing.Open addressing
>>> can quickly lead to growth in the main table.Also, chaining is a much
>>> cleaner way of collision resolution,IMHO.
>>
>> What do you mean by "growth in the main table"?
>
> Sorry, I should have been more verbose.
>
> AFAIK, Open addressing can be slower with a load factor approaching 1
> as compared to chaining. Also, I feel that implementation of open
> addressing can be more complicated as we have to deal with deletes
> etc.
>
> I feel we can redesign our current chaining mechanism to have skip
> lists instead of singly linked lists. I experimented with it sometime
> back and I feel that it gives a stable performance with higher loads.
>
> Regards,
>
> Atri

OK, thanks for the explanation.

I've spent some time hacking this yesterday, the results are available in
a separate branch:

  https://github.com/tvondra/count_distinct/tree/open-addressing

The complexity of the implementation is pretty much comparable to
chaining. I assume it would get much more complex with handling deletes
etc., but that's not really necessary here.

However I'm unable to make it at least comparable to chaining. The trick
is that the hash table performs reasonably only until it's ~ 65-70% full,
it gets really slow after that. So to maintain performance comparable to
chaining, I'd have to keep the table below this threshold, effectively
wasting ~30% of memory.

And the topic of this thread was about decreasing the memory consumptions,
so it seems to me open-addressing is not a good match here. I'll try a few
more things but I don't think it's going to fly.

I've made some significant improvements in the chaining version (in the
master branch), now getting about the memory consumption I've estimated.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to