Re: [HACKERS] Sequential scans

Heikki Linnakangas Thu, 03 May 2007 11:31:40 -0700

Jeff Davis wrote:

What I was trying to say before is that, in my design, it keeps track of
relations being scanned, not scans happening. So 5 backends scanning the
same table would result in one entry that consists of the most recently-
read block by any of those backends for that relation.


We're talking across each other...

I understand that the data structure keeps track of relations beingscanned, with one entry per such relation. I think that's very gooddesign, and I'm not suggesting to change that.

But what's the right size for that? We don't know how many large tablesthere is in the database, so we can't use that. A hard-coded constantmaybe? What would it be? I figured we could use NBackends, because therecan't realistically be more than one large seq scan per backend going onat any point in time. That's an upper bound, in any real worldapplication there's far less than that.

Part of the reason I did this is because, with a Sync Scan, it seems
like there may be possible reasons to do multiple seq scans concurrently
in the same backend. This is completely contrived, but consider:

SELECT * FROM bigtable UNION ALL SELECT * FROM bigtable;

I'm not trying to argue for such a plan to exist right now, but if there
is a possibility that we'd want to create such plans in the future, I

think it's better to track by relation rather than by backend.

Those two seqscans would be executed one after each other, not inparallel, so you would still have just one seq scan at any given pointin time.

(1) When starting a new scan on relation R, with which backend do we

choose to synchronize?(2) If 3/5 of those scans have finished (and there is therefore no point

to synchronize with those scans), how do we find the two that are still
moving?

I think storing one entry per relation being scanned, regardless of the
number of backends, nicely avoids both of those questions.


Yeah, agreed. That's a good design.

What if I just make an LRU that occupies a fixed size? Any reads or
updates can start at the MRU item and work backward, and any evictions
can happen at the LRU item.

Does a hash table really save us cycles if we keep the list small, say,
100 items? Looking at all 100 items would only be required perhaps on a
scan startup. I don't have a good sense of scale here, is following 50
pointers while holding a lock a significant source of contention?

Hmm. If you only need to scan through it when you start the scan, Isuppose it would be ok.


The idea is that the list would never grow longer than the total number
of tables that are larger than sync_seqscan_threshold, and would have
some maximum (to fit into fixed-size shared memory). The list would be a
bi-directional LRU list (MRU at head, LRU at tail). When evicting an
relation from the list due to space exhaustion, it would delete the tail
of the list. When a new scan starts and it's already in the list, it
would most likely just need to read a few of the hot elements at the
head of the list before it found the relation in question. When it's not
in the list, the scan may need to read all the way through to see that
it's not there.

Yeah, that works well when there's just a few hot elements in the list.I'm not sure how large the list would need to grow until scanning itwould become a performance bottleneck, but I suppose a list of 5-10elements would be perfectly fine. But is that enough?

100 is of course arbitrary, and that could grow or shrink automatically
at runtime.
Except that it can't shrink, because we don't have any convenient momentto remove an entry from the list, other than dropping the least recentlyused entry out of the list when inserting a new one. And since it's inshared mem, the allocated size is fixed at boot up, so we can't grow iteither.
It can shrink when a new scan looks through the list and passes by
entries that don't need to be in the list.


How do you identify an entry that doesn't need to be there?

I don't want to over-complicate things, so if you think having a hash
table is a better, simpler, or more readable design, I'll go with that
instead of the list-only approach. The only reason I suggested the list-
only approach was to see if the hash table was really gaining anything
for us. In practice, it will be quite rare that more than 10 different
big relations are being scanned at once, and I don't know how much of a
gain a hash table is when there are (in all but exceptional cases) only
a few entries.

How about implementing the list-only approach first, since you're goingto need the LRU list anyway, and we can add consider adding the hashtable later?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] Sequential scans

Reply via email to