On Mon, Mar 27, 2017 at 7:04 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
>> I am suspicious that _hash_kill_items() is going to have problems if
>> the overflow page is freed before it reacquires the lock.
>> _btkillitems() contains safeguards against similar cases.
> I have added similar check for hash_kill_items() as well.

I think hash_kill_items has a much bigger problem which is that we
can't kill the items if the page has been modified after re-reading
it.  Consider a case where Vacuum starts before the Scan on the bucket
and then Scan went ahead (which is possible after your 0003 patch) and
noted the killed items in killed item array and before it could kill
all the items, Vacuum remove those items.  Now it is quite possible
that before scan tries to kill those items, the corresponding itemids
have been used by different tuples.  I think what we need to do here
is to store the LSN of page first time when we have read the page and
then compare it with current page lsn after re-reading the page in

+ HashScanPosItem items[MaxIndexTuplesPerPage]; /* MUST BE LAST */
+} HashScanPosData;

HashScanPosItem *killedItems; /* tids and offset numbers of killed items */
  int numKilled; /* number of currently stored items */
+ /*
+ * Identify all the matching items on a page and save them
+ * in HashScanPosData
+ */
+ HashScanPosData currPos; /* current position data */
 } HashScanOpaqueData;

After having array of HashScanPosItems as currPos.items, I think you
don't need array of HashScanPosItem for killedItems, just an integer
array of index in currPos.items should be sufficient.

I think below para in README is not valid after this patch.

"To keep concurrency reasonably good, we require readers to cope with
concurrent insertions, which means that they have to be able to
their current scan position after re-acquiring the buffer content lock
on page.  Since deletion is not possible while a reader holds the pin
on bucket, and we assume that heap tuple TIDs are unique, this can be
implemented by searching for the same heap tuple TID previously
returned.  Insertion does not move index entries across pages, so the
previously-returned index entry should always be on the same page, at
the same or higher offset number,
as it was before."

- next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE,
-   bstrategy);

- * release the lock on previous page after acquiring the lock on next
- * page
+ * As the hash index scan work in page at a time mode,
+ * vacuum can release the lock on previous page before
+ * acquiring lock on the next page.
+ next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE,
+   bstrategy);

After this change, you need to modify comments on top of function
hashbucketcleanup() and _hash_squeezebucket().

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to