On Wed, Nov 9, 2016 at 12:11 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Nov 9, 2016 at 11:41 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> On Wed, Nov 9, 2016 at 9:10 PM, Robert Haas <robertmh...@gmail.com> wrote:
>>> On Wed, Nov 9, 2016 at 9:04 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>> I think we can give a brief explanation right in the code comment.  I
>>> referred to "decreasing the TIDs"; you are referring to "moving tuples
>>> around".  But I think that moving the tuples to different locations is
>>> not the problem.  I think the problem is that a tuple might be
>>> assigned a lower spot in the item pointer array
>> I think we both understand the problem and it is just matter of using
>> different words.  I will go with your suggestion and will try to
>> slightly adjust the README as well so that both places use same
>> terminology.
> Yes, I think we're on the same page.

Some more review:

The API contract of _hash_finish_split seems a bit unfortunate.   The
caller is supposed to have obtained a cleanup lock on both the old and
new buffers, but the first thing it does is walk the entire new bucket
chain, completely ignoring the old one.  That means holding a cleanup
lock on the old buffer across an unbounded number of I/O operations --
which also means that you can't interrupt the query by pressing ^C,
because an LWLock (on the old buffer) is held.  Moreover, the
requirement to hold a lock on the new buffer isn't convenient for
either caller; they both have to go do it, so why not move it into the
function?  Perhaps the function should be changed so that it
guarantees that a pin is held on the primary page of the existing
bucket, but no locks are held.

Where _hash_finish_split does LockBufferForCleanup(bucket_nbuf),
should it instead be trying to get the lock conditionally and
returning immediately if it doesn't get the lock?  Seems like a good

     * We're at the end of the old bucket chain, so we're done partitioning
     * the tuples.  Mark the old and new buckets to indicate split is
     * finished.
     * To avoid deadlocks due to locking order of buckets, first lock the old
     * bucket and then the new bucket.

These comments have drifted too far from the code to which they refer.
The first part is basically making the same point as the
slightly-later comment /* indicate that split is finished */.

The use of _hash_relbuf, _hash_wrtbuf, and _hash_chgbufaccess is
coming to seem like a horrible idea to me.  That's not your fault - it
was like this before - but maybe in a followup patch we should
consider ripping all of that out and just calling MarkBufferDirty(),
ReleaseBuffer(), LockBuffer(), UnlockBuffer(), and/or
UnlockReleaseBuffer() as appropriate.  As far as I can see, the
current style is just obfuscating the code.

                itupsize = new_itup->t_info & INDEX_SIZE_MASK;
                new_itup->t_info &= ~INDEX_SIZE_MASK;
                new_itup->t_info |= INDEX_MOVED_BY_SPLIT_MASK;
                new_itup->t_info |= itupsize;

If I'm not mistaken, you could omit the first, second, and fourth
lines here and keep only the third one, and it would do exactly the
same thing.  The first line saves the bits in INDEX_SIZE_MASK.  The
second line clears the bits in INDEX_SIZE_MASK.   The fourth line
re-sets the bits that were originally saved.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to