[HACKERS] Planned change of ExecRestrPos API

2005-05-15 Thread Tom Lane
I'm planning to change ExecRestrPos and the routines it calls so that
an updated TupleTableSlot holding the restored-to tuple is explicitly
returned.

Currently, since nothing is explicitly done to the result Slot of a
plan node when we restore its position, you might think that the Slot
still points at the tuple that was current just before the Restore.
You'd be wrong though, at least for seqscan and indexscan plans
(I haven't looked yet at the other node types that support
mark/restore).  The reason is that the restore operation changes
the contents of a HeapTupleData struct in the scan state (rs_ctup
or xs_ctup) and all that the Slot really contains is a pointer to
that struct.

Now this is really bad.  In the first place, the Slot thinks it
has a pin on the buffer containing its current tuple.  After a
Restore, it may have pin on the wrong buffer.  It seems to be
sheer chance that we've not had bugs due to this.  (The underlying
scan does have pin on the right buffer, but one can easily imagine
sequences in which the scan could be cleared while the Slot is
still assumed valid.)  As of CVS tip the consequences could be
even worse, because the Slot may contain some pointers to extracted
fields of the tuple, and these pointers are now out of sync with
the tuple that the Slot really contains.

So I think that it's essential that we explicitly update the scan
result Slot during ExecRestrPos.

It seems to be a good idea also to make the function return the Slot.
As far as I can tell, nodeMergeJoin has been depending on the assumption
that the physical address of the result slot doesn't change during
Restore.  Which is true for all the current plan types, but since
the ExecProcNode API isn't designed to assume that a node always
returns the same Slot, it doesn't seem like ExecRestrPos should either.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Planned change of ExecRestrPos API

2005-05-15 Thread Tom Lane
I wrote:
 Currently, since nothing is explicitly done to the result Slot of a
 plan node when we restore its position, you might think that the Slot
 still points at the tuple that was current just before the Restore.
 You'd be wrong though, at least for seqscan and indexscan plans
 (I haven't looked yet at the other node types that support
 mark/restore).

Actually, on looking closer, only seqscans have this problem --- and
ExecSeqRestrPos is really dead code anyway at the moment.  So rather
than go through a large exercise to change the mark/restore API, I've
just added some comments about what the API actually guarantees, and
tweaked ExecSeqRestrPos to clear the output slot instead of leaving it
in a potentially inconsistent state.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Planned change of ExecRestrPos API

2005-05-15 Thread Simon Riggs
On Sun, 2005-05-15 at 15:09 -0400, Tom Lane wrote:
 I'm planning to change ExecRestrPos and the routines it calls so that
 an updated TupleTableSlot holding the restored-to tuple is explicitly
 returned.
 
 Currently, since nothing is explicitly done to the result Slot of a
 plan node when we restore its position, you might think that the Slot
 still points at the tuple that was current just before the Restore.
 You'd be wrong though, at least for seqscan and indexscan plans
 (I haven't looked yet at the other node types that support
 mark/restore).  The reason is that the restore operation changes
 the contents of a HeapTupleData struct in the scan state (rs_ctup
 or xs_ctup) and all that the Slot really contains is a pointer to
 that struct.
 
 Now this is really bad.  In the first place, the Slot thinks it
 has a pin on the buffer containing its current tuple.  After a
 Restore, it may have pin on the wrong buffer.  

Sounds terrible. Is this a particular bug you're fixing?

 It seems to be
 sheer chance that we've not had bugs due to this.  

It isn't a very common case, thats why. You'd need to have value
duplicates in both joined columns, which is effectively a product join.
Granted it is syntactically allowable SQL.

AFAICS all joins should be between relations 1:1 or 1:M. A direct M:M
join is actually missing out the associative relation, or a non-key self
join. Such a join would rarely if ever have any correct and real
meaning. I can think of a few, but mostly its just incorrectly coded
SQL, or use of special values (e.g. blanks) rather than NULLs.

So my guess is that ExecRestrPos is almost never called.

 (The underlying
 scan does have pin on the right buffer, but one can easily imagine
 sequences in which the scan could be cleared while the Slot is
 still assumed valid.)  As of CVS tip the consequences could be
 even worse, because the Slot may contain some pointers to extracted
 fields of the tuple, and these pointers are now out of sync with
 the tuple that the Slot really contains.
 
 So I think that it's essential that we explicitly update the scan
 result Slot during ExecRestrPos.

Yeh, no problem.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend