On Mon, 2007-06-04 at 18:25 -0400, Tom Lane wrote:
> But note that barring backend crash, once all the scans are done it is
> guaranteed that the hint will be removed --- somebody will be last to
> update the hint, and therefore will remove it when they do heap_endscan,
> even if others are not quite done.  This is good in the sense that
> later-starting backends won't be fooled into starting at what is
> guaranteed to be the most pessimal spot, but it's got a downside too,
> which is that there will be windows where seqscans are in process but
> a newly started scan won't see them.  Maybe that's a killer objection.

I don't think it would be a major objection. If there aren't other
sequential scans in progress, the point is moot, and if there are:
(a) the hint has a lower probability of being removed, since it may
contain the PID of one of those other scans.
(b) the hint is likely to be replaced quite quickly

The problem is, I think people would be more frustrated by 1 in 1000
queries starting the scan in the wrong place because a hint was deleted,
because that could cause a major difference in performance. I expect the
current patch would have more consistent performance for that reason.

To me, it seems to be a small benefit and a small cost. It's hard for me
to feel very strongly either way.

> When exactly is the hint updated?  I gathered from something Heikki said
> that it's set after processing X amount of data, but I think it might be
> better to set it *before* processing X amount of data.  That is, the
> hint means "I'm going to be scanning at least <threshold> blocks
> starting here", not "I have scanned <threshold> blocks ending here",
> which seems like the interpretation that's being used at the moment.
> What that would mean is that successive "LIMIT 1000" calls would in fact
> all start at the same place, barring interference from other backends.

If I understand correctly, this is a one-page difference in the report
location, right? We can either report that we've just finished scanning
block 1023 (ending an X block chunk of reading) and another backend can
start scanning at 1023 (current behavior); or we could report that we're
about to scan an X block chunk of data starting with block 1024, and the
new scan can start at 1024. We don't want the new scan to jump in ahead
of the existing scan, because then we're introducing uncached blocks
between the two scans -- risking divergence. 

If the data occupies less than X data pages, the LIMIT queries will be
deterministic for single-scans anyway, because no reports will happen
(other than the starting location, which won't matter in this case).

If the data is more than that, then at least one report would have
happened. At this point, you're talking about rewinding the scan (how
far?), which I originally coded for with sync_seqscan_offset. That
feature didn't prove very useful (yet), so I removed it. 

        Jeff Davis

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at


Reply via email to