Andres Freund <and...@anarazel.de> writes:
> Because iteration (both in my implementation and dynahash) is
> essentially in bucket order, the front of the hashtable will be mostly
> empty, whereas later parts of the hashtable will be full (i.e. a lot of
> collisions). The reason for that is that we'll find all parts of the
> hashtable that are uncompressed and "early" in the hashspace, and
> they'll possibly hash to something later in the table.
Hm, yeah, I had supposed that this would hit a random part of the key
space, but you're right that over successive calls it's not good.
My idea of an appropriate fix would be to resume the scan at the same
point where the last scan stopped, and work circularly around the table
when necessary. But I'm not sure there is any really good way to do that
in the dynahash infrastructure. Maybe it'd work to keep the iteration
state around, but I don't remember how well that interacts with other
insertions/deletions. What about in your implementation?
There's also the point mentioned in the existing comment, that it'd be
better to go after pages with more bits set first. Not sure of an
inexpensive way to do that (ie, one that doesn't involve multiple
scans of the hashtable). But your results suggest that maybe it'd
be worth making tbm_lossify slower in order to get better results.
regards, tom lane
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: