[HACKERS] Sequential scans

Heikki Linnakangas Wed, 02 May 2007 06:29:40 -0700

Hi,

I'm starting to review the "synchronized scans" and "scan-resistantbuffer cache" patches. The patches have complex interactions so I'mtaking a holistic approach.


There's four outstanding issues with the sync scans in particular:

1. The simplistic hash approach. While it's nice to not have a lock, I'mworried of collisions. If you had a collision every now and then, itwouldn't be that bad, but because the hash value is computed from theoid, a collision would be persistent. If you create a database andhappen to have two frequently seqscanned tables that collide, the onlyway to get rid of the collision is to drop and recreate a table.Granted, that'd probably be very rare in practice, but when it happensit would be next to impossible to figure out what's going on.

Let's use a normal hash table instead, and use a lock to protect it. Ifwe only update it every 10 pages or so, the overhead should benegligible. To further reduce contention, we could modify ReadBuffer tolet the caller know if the read resulted in a physical read or not, andonly update the entry when a page is physically read in. That way allthe synchronized scanners wouldn't be updating the same value, just theone performing the I/O. And while we're at it, let's use the fullrelfilenode instead of just the table oid in the hash.

2. Under what circumstances does the patch help and when does it hurt? Ithink the patch is safe in that it should never be any worse than whatwe have now. But when does it help? That needs to be looked at togetherwith the other patch.

I need to dig the archives for the performance test results you postedearlier and try to understand them.

There's six distinct scenarios I've come up with this far that need tobe looked at:

A. A seq scan on a small table

B. A seq scan on a table that's 110% the size of shared_buffers, butsmaller than RAM

C. A seq scan on a table that's 110% the size of RAM
D. A seq scan on a huge table
E. Two simultaneous seq scans on a large table starting at the same time

F. Two simultaneous seq scans on a large table, 2nd one starting whenthe 1st one is halfway through

Also, does it change things if you have a bulk update instead ofread-only query? How about bitmap heap scans and large index scans? Andvacuums? And the above scenarios need to be considered both alone, andin the presence of other OLTP kind of workload.

I realize that we can't have everything, and as long as we get somesignificant benefit in some scenarios, and don't hurt others, the patchis worthwhile. But let's try to cover as much as we reasonably can.

One random idea I had to cover B & C without having the offset variable:Start scanning *backwards* from the page that's in the shared hashtable, until you hit a page that's not in buffer cache. Then youcontinue scanning forwards from the page you started from.

This needs more thought but I think we can come up with a pretty simplesolution that covers the most common cases.

3. By having different backends doing the reads, are we destroying OSreadahead as Tom suggested? I remember you performed some tests on that,and it was a problem on some systems but not on others. This needs somethought, there may be some simple way to address that.

4. It fails regression tests. You get an assertion failure on the portaltest. I believe that changing the direction of a scan isn't handledproperly; it's probably pretty easy to fix.

Jeff, could you please fix 1 and 4? I'll give 2 and 3 some more thought,and take a closer look at the scan-resistant scans patch. Any commentsand ideas are welcome, of course..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

[HACKERS] Sequential scans

Reply via email to