>I am a bit mystified about how this manages to work with array keys.
>_bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
>unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
>_bt_parallel_advance_scan() won't do anything unless btps_pageStatus
>is already BTPARALLEL_DONE. It seems like one of those two things has
>to be wrong.
btps_pageStatus is to be set to BTPARALLEL_DONE only by the first worker
performing scan for latest array key and which has encountered end of scan.
This is ensured by
following check in _bt_parallel_done(),
if (so->arrayKeyCount >= btscan->btps_arrayKeyCount &&
btscan->btps_pageStatus != BTPARALLEL_DONE)
Thus, BTPARALLEL_DONE marks end of scan only for latest array keys. This
ensures that when any worker reaches
_bt_advance_array_keys() it advances latest scan which is marked by
btscan->btps_arrayKeyCount only when latest scan
has ended by checking if(btps_pageStatus == BTPARALLEL_DONE) in
_bt_advance_array_keys(). Otherwise, the worker just
advances its local so->arrayKeyCount.
I hope this provides some clarification.
On Wed, Feb 1, 2017 at 3:52 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Jan 24, 2017 at 4:51 PM, Robert Haas <robertmh...@gmail.com>
> > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila <amit.kapil...@gmail.com>
> >> In spite of being careful, I missed reorganizing the functions in
> >> genam.h which I have done in attached patch.
> > Cool. Committed parallel-generic-index-scan.2.patch. Thanks.
> Reviewing parallel_index_scan_v6.patch:
> I think it might be better to swap the return value and the out
> parameter for _bt_parallel_seize; that is, return a bool, and have
> callers ignore the value of the out parameter (e.g. *pageno).
> I think _bt_parallel_advance_scan should be renamed something that
> includes the word "keys", like _bt_parallel_advance_array_keys.
> The hunk in indexam.c looks like a leftover that can be omitted.
> + * Below flags are used to indicate the state of parallel scan.
> They aren't really flags any more; they're members of an enum. I
> think you could just leave this sentence out entirely and start right
> in with descriptions of the individual values. But maybe all of those
> descriptions should end in a period (instead of one ending in a period
> but not the other three) since they read like complete sentences.
> + * btinitparallelscan - Initializing BTParallelScanDesc for parallel
> btree scan
> Initializing -> initialize
> + * btparallelrescan() -- reset parallel scan
> Previous two prototypes have one dash, this one has two. Make it
> consistent, please.
> + * Ideally, we don't need to acquire spinlock here, but being
> + * consistent with heap_rescan seems to be a good idea.
> How about: In theory, we don't need to acquire the spinlock here,
> because there shouldn't be any other workers running at this point,
> but we do so for consistency.
> + * _bt_parallel_seize() -- returns the next block to be scanned for
> + * scans and latest block scanned for backward scans.
> I think the description should be more like "Begin the process of
> advancing the scan to a new page. Other scans must wait until we call
> bt_parallel_release() or bt_parallel_done()." And likewise
> _bt_parallel_release() should say something like "Complete the process
> of advancing the scan to a new page. We now have the new value for
> btps_scanPage; some other backend can now begin advancing the scan."
> And _bt_parallel_done should say something like "Mark the parallel
> scan as complete."
> I am a bit mystified about how this manages to work with array keys.
> _bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
> unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
> _bt_parallel_advance_scan() won't do anything unless btps_pageStatus
> is already BTPARALLEL_DONE. It seems like one of those two things has
> to be wrong.
> _bt_readpage's header comment should be updated to note that, in the
> case of a parallel scan, _bt_parallel_seize should have been called
> prior to entering this function, and _bt_parallel_release will be
> called prior to return (or this could alternatively be phrased in
> terms of btps_pageStatus on entry/exit).
> _bt_readnextpage isn't very clear about the meaning of its blkno
> argument. It looks like it always has to be valid when scanning
> forward, but only in the parallel case when scanning backward? That
> is a little odd. Another, possibly related thing that is odd is that
> when _bt_steppage() finds no tuples and decides to advance to a new
> page again, there's a very short comment in the forward case and a
> very long comment in the backward case:
> /* nope, keep going */
> * For parallel scans, get the last page scanned as it is quite
> * possible that by the time we try to fetch previous page,
> * worker has also decided to scan that previous page. We
> * avoid that by doing _bt_parallel_release once we have read
> * current page, but it is bad to make other workers wait till
> * read the page.
> Now it seems to me that these cases are symmetric and the issues are
> identical. It's basically that, while we were judging whether the
> current page has useful contents, some other process could have
> advanced the scan (since _bt_readpage un-seized it).
> - /* check for deleted page */
> page = BufferGetPage(so->currPos.buf);
> TestForOldSnapshot(scan->xs_snapshot, rel, page);
> opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> + /* check for deleted page */
> This is an independent change; committed.
> What kind of testing has been done to ensure that this doesn't have
> concurrency bugs? What's the highest degree of parallelism that's
> been tested? Did that test include array keys?
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company