Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
The implementation has changed a bit since August. I thought I had
submitted an updated version in the winter but couldn't find it. Anyway,
I updated and dusted off the source tree, tidied up the comments a
little bit, and fixed some inconsistencies in pg_proc entries that made
opr_sanity to fail.

I started looking at this patch.  My first reaction is that I liked last
August's API (an independent "suggestblock" call) a lot better.  I think
trying to hold an index page lock across the heap insert is an extremely
bad idea; it will hurt concurrency and possibly cause deadlocks
(especially for unique indexes).

The index page is not kept locked across the calls. Just pinned.

The reason for switching to the new API instead of the amsuggestblock API is CPU overhead. It avoids constructing the IndexTuple twice and descending the tree twice.

Clustering is mainly useful when the table doesn't fit in cache, so one could argue that if you care about clustering you're most likely I/O bound and don't care about the CPU overhead that much. Nevertheless, avoiding it seems like a good idea to me.

The amsuggestblock API is simpler, though. We might choose it on those grounds alone.

The other question is why is execMain involved in this?  That makes the
design nonfunctional for tuples inserted in any other way than through
the main executor --- COPY for instance.  Also, if this is successful,
I could see using it on system catalogs eventually.  I'm inclined to
think that the right design is for heap_insert to call amsuggestblock
for itself, or maybe even push that down to RelationGetBufferForTuple.
(Note: having heap_insert contain logic that duplicates
RelationGetBufferForTuple's is another bit of bad design here, but
that's at least correctable locally.)  Now the difficulty in that is
that the heapam.c routines don't have hold of any data structure
containing index knowledge ... but they do have hold of the Relation
structure for the heap.  I suggest making RelationGetIndexList() cache
the OID of the clustered index (if any) in relcache entries, and add
RelationGetClusterIndex much like RelationGetOidIndex, and then
heap_insert can use that.

Hmm. My first reaction on that is that having heap_insert reach into an index is a modularity violation. It wouldn't be too hard to make similar changes to COPY that I did to the executor.

I doubt it's very useful for system catalogs; they should never grow very large, and should stay mostly cached anyway.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to