On Tue, 2007-03-13 at 17:11 +0000, Simon Riggs wrote:
> On Mon, 2007-03-12 at 17:46 -0700, Jeff Davis wrote:
> > On Mon, 2007-03-12 at 13:21 +0000, Simon Riggs wrote:
> > > So based on those thoughts, sync_scan_offset should be fixed at 16,
> > > rather than being variable. In addition, ss_report_loc() should only
> > > report its position every 16 blocks, rather than do this every time,
> > > which will reduce overhead of this call.
> > If we fix sync_scan_offset at 16, we might as well just get rid of it.
> > Sync scans are only useful on large tables, and getting a free 16 pages
> > over a scan isn't worth the trouble. However, even without
> > sync_scan_offset,
> Not sure what you mean by "a free 16 pages". Please explain?
By "free" I mean already in cache, and therefore don't have to do I/O to
get it. I used the term loosely above, so let me re-explain:
My only point was that 16 is essentially 0 when it comes to
sync_scan_offset, because it's a small number of blocks over the course
of the scan of a large table.
If sync_scan_offset is 0, my patch will cause scans on a big table to
start where other scans are, and those scans should tend to stay
together and use newly-cached pages efficiently (and achieve the primary
goal of the patch).
The advantage of sync_scan_offset is that, in some situations, a second
scan can actually finish faster than if it were the only query
executing, because a previous scan has already caused some blocks to be
cached. However, 16 is a small number because that benefit would only be
once per scan, and sync scans are only helpful on large tables.
> > I like the idea of reducing tuning parameters, but we should, at a
> > minimum, still allow an on/off button for sync scans. My tests revealed
> > that the wrong combination of OS/FS/IO-Scheduler/Controller could result
> > in bad I/O behavior.
Do you have an opinion about sync_scan_threshold versus a simple
> I'd still like to be able to trace each scan to see how far ahead/behind
> it is from the other scans on the same table, however we do that.
> Any backend can read the position of other backend's scans, so it should
Where is that information stored? Right now my patch will overwrite the
hints of other backends, because I'm using a static data structure
(rather than one that grows). I do this to avoid the need for locking.
> be easy enough to put in a regular LOG entry that shows how far
> ahead/behind they are from other scans. We can trace just one backend
> and have it report on where it is with respect to other backends, or you
> could have them all calculate their position and have just the lead scan
> report the position of all other scans.
I already have each backend log it's progression through the tablescan
every 100k blocks to DEBUG (higher DEBUG gives every 10k blocks). I
currently use this information to see whether scans are staying together
or not. I think this gives us the information we need without backends
needing to communicate the information during execution.
I think I will increase the resolution of the scan progress so that we
can track every 5k or even 1k blocks read per pid per scan. That might
tell us more about the shared memory usage versus OS cache.
Is there any other information you need reported?
> I'd like to see the trace option to allow us to tell whether its working
> as well as we'd like it to pre-release and in production. Also I want to
> see whether various settings of scan_recycle_buffers help/hinder the
> effectiveness of synch scans, as others have worried it might.
Can you tell me what you mean by trace option, if you mean something
different than tracking the relative positions of the scans?
I will update my patch and send it along so that we can see how they
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?