On Tue, Nov 1, 2016 at 9:09 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> [ new patches ] > > I looked over parts of this today, mostly the hashinsert.c changes.
Some more review... @@ -656,6 +678,10 @@ _hash_squeezebucket(Relation rel, IndexTuple itup; Size itemsz; + /* skip dead tuples */ + if (ItemIdIsDead(PageGetItemId(rpage, roffnum))) + continue; Is this an optimization independent of the rest of the patch, or is there something in this patch that necessitates it? i.e. Could this small change be committed independently? If not, then I think it needs a better comment explaining why it is now mandatory. - * Caller must hold exclusive lock on the target bucket. This allows + * Caller must hold cleanup lock on the target bucket. This allows * us to safely lock multiple pages in the bucket. The notion of a lock on a bucket no longer really exists; with this patch, we'll now properly speak of a lock on a primary bucket page. Also, I think the bit about safely locking multiple pages is bizarre -- that's not the issue at all: the problem is that reorganizing a bucket might confuse concurrent scans into returning wrong answers. I've included a broader updating of that comment, and some other comment changes, in the attached incremental patch, which also refactors your changes to _hash_freeovflpage() a bit to avoid some code duplication. Please consider this for inclusion in your next version. In hashutil.c, I think that _hash_msb() is just a reimplementation of fls(), which you can rely on being present because we have our own implementation in src/port. It's quite similar to yours but slightly shorter. :-) Also, some systems have a builtin fls() function which actually optimizes down to a single machine instruction, and which is therefore much faster than either version. I don't like the fact that _hash_get_newblk() and _hash_get_oldblk() are working out the bucket number by using the HashOpaque structure within the bucket page they're examining. First, it seems weird to pass the whole structure when you only need the bucket number out of it. More importantly, the caller really ought to know what bucket they care about without having to discover it. For example, in _hash_doinsert(), we figure out the bucket into which we need to insert, and we store that in a variable called "bucket". Then from there we work out the primary bucket page's block number, which we store in "blkno". We read the page into "buf" and put a pointer to that buffer's contents into "page" from which we discover the HashOpaque, a pointer to which we store into "pageopaque". Then we pass that to _hash_get_newblk() which will go look into that structure to find the bucket number ... but couldn't we have just passed "bucket" instead? Similarly, _hash_expandtable() has the value available in the variable "old_bucket". The only caller of _hash_get_oldblk() is _hash_first(), which has the bucket number available in a variable called "bucket". So it seems to me that these functions could be simplified to take the bucket number as an argument directly instead of the HashOpaque. Generally, this pattern recurs throughout the patch. Every time you use the data in the page to figure something out which the caller already knew, you're introducing a risk of bugs: what if the answers don't match? I think you should try to root out as much of that from this code as you can. As you may be able to tell, I'm working my way into this patch gradually, starting with peripheral parts and working toward the core of it. Generally, I think it's in pretty good shape, but I still have quite a bit left to study. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
-- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers