On Wed, 2007-05-02 at 20:58 +0100, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > What should be the maximum size of this hash table? 
> Good question. And also, how do you remove entries from it?
> I guess the size should somehow be related to number of backends. Each 
> backend will realistically be doing just 1 or max 2 seq scan at a time. 
>   It also depends on the number of large tables in the databases, but we 
> don't have that information easily available. How about using just 
> NBackends? That should be plenty, but wasting a few hundred bytes of 
> memory won't hurt anyone.

One entry per relation, not per backend, is my current design.

> I think you're going to need an LRU list and counter of used entries in 
> addition to the hash table, and when all entries are in use, remove the 
> least recently used one.
> The thing to keep an eye on is that it doesn't add too much overhead or 
> lock contention in the typical case when there's no concurrent scans.
> For the locking, use a LWLock.

Ok. What would be the potential lock contention in the case of no
concurrent scans?

Also, is it easy to determine the space used by a dynahash with N
entries? I haven't looked at the dynahash code yet, so perhaps this will
be obvious.

> No, not the segment. RelFileNode consists of tablespace oid, database 
> oid and relation oid. You can find it in scan->rs_rd->rd_node. The 
> segmentation works at a lower level.

Ok, will do.

> Hmm. Should we care then? CFG is the default on Linux, and an average 
> sysadmin is unlikely to change it.

Keep in mind that concurrent sequential scans with CFQ are *already*
very poor. I think that alone is an interesting fact that's somewhat
independent of Sync Scans.

> - when ReadBuffer is called, let the caller know if the read did 
> physical I/O.
> - when the previous ReadBuffer didn't result in physical I/O, assume 
> that we're not the pack leader. If the next buffer isn't already in 
> cache, wait a few milliseconds before initiating the read, giving the 
> pack leader a chance to do it instead.
> Needs testing, of course..

An interesting idea. I like that the most out of the ideas of
maintaining a "pack leader". That's very similar to what the Linux
anticipatory scheduler does for us.

> >> 4. It fails regression tests. You get an assertion failure on the portal 
> >> test. I believe that changing the direction of a scan isn't handled 
> >> properly; it's probably pretty easy to fix.
> >>
> > 
> > I will examine the code more carefully. As a first guess, is it possible
> > that test is failing because of the non-deterministic order in which
> > tuples are returned?
> No, it's an assertion failure, not just different output than expected. 
> But it's probably quite simple to fix..

Ok, I'll find and correct it then.

        Jeff Davis

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to