On 26/02/14 03:08, Robert Haas wrote:
On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris <j...@wizmail.org> wrote:
On 24/02/14 17:38, Robert Haas wrote:
On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris <j...@wizmail.org> wrote:
Run under cachegrind, it takes about N/10 last-level cache misses,
all for the new item being introduced to the heap. The existing
code takes none at all.
Can you explain this further? This seems like an important clue that
could be useful when trying to optimize this code, but I'm a little
unclear which part of the operation has more cache misses with your
changes and why.
In the patched version, for the heapify operation the outer loop
starts at the last heap-parent tuple and works backward to the
start of the tuples array. A copy is taken of the SortTuple being
operated on for the inner loop to use. This copy suffers cache misses.
(The inner loop operates on elements between the copy source and the
end of the array).
In the original, the outer loop runs the array in increasing index
order. Again a copy is taken of the SortTuple for the inner loop
to use. This copy does not appear to take significant cache misses.
(The inner loop operates on elements between the copy source and
the start of the array).
Do you have any theory as to why this happens in one case but not the other?
Nope. Only really wild stuff requiring the cpu memory channel
recognising the forward scan (despite the inner loop) and not
the backward one - and ignoring prefetch instructions, which I
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: