On 10/31/2011 02:44 PM, Robert Haas wrote:
What I think you're probably measuring here (oprofile would tell us
for sure) is that once the size of the table goes beyond about half a
gigabyte, it will have more than one page in the visibility map. The
index-only scan code keeps the most recently used visibility map page
pinned to save on overhead, but if you're bouncing back and forth
between data in the first ~500MB of the table and data in the last
~100MB, each switch will result in dropping the current pin and
getting a new one, which figures to be fairly expensive. With the
table is only a little over 500GB, you're probably only changing VM
pages every couple of tuples, but with a 6GB table just about every
tuple will switch to a new VM page.
Now, maybe you're right and the CPU caches are the more significant
effect. But I wouldn't like to bet on it without seeing how much the
drop-and-get-new-pin operations are costing us.
Maybe I should have left the analysis part out of the post,
I don't know the internals, so my analysis is likely to be wrong.
Now that I think of it, claiming that the cache effect is 50%
of the runtime is likely a little wrong...
However the part about clustering being important is still correct.
According to the test, you can get 50% overhead because of
random access to the VM.
Stupid question, but why not keep the whole VM pinned?
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: