Hello Robert,

>I am a bit mystified about how this manages to work with array keys.
>_bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
>unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
>_bt_parallel_advance_scan() won't do anything unless btps_pageStatus
>is already BTPARALLEL_DONE.  It seems like one of those two things has
>to be wrong.

btps_pageStatus is to be set to BTPARALLEL_DONE only by the first worker
which is
performing scan for latest array key and which has encountered end of scan.
This is ensured by
following check in _bt_parallel_done(),

   if (so->arrayKeyCount >= btscan->btps_arrayKeyCount &&
       btscan->btps_pageStatus != BTPARALLEL_DONE)

Thus, BTPARALLEL_DONE marks end of scan only for latest array keys. This
ensures that when any worker reaches
_bt_advance_array_keys() it advances latest scan which is marked by
btscan->btps_arrayKeyCount only when latest scan
has ended by checking if(btps_pageStatus == BTPARALLEL_DONE) in
_bt_advance_array_keys(). Otherwise, the worker just
advances its local so->arrayKeyCount.

I hope this provides some clarification.

Thank you,
Rahila Syed





On Wed, Feb 1, 2017 at 3:52 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Jan 24, 2017 at 4:51 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> >> In spite of being careful, I missed reorganizing the functions in
> >> genam.h which I have done in attached patch.
> >
> > Cool.  Committed parallel-generic-index-scan.2.patch.  Thanks.
>
> Reviewing parallel_index_scan_v6.patch:
>
> I think it might be better to swap the return value and the out
> parameter for _bt_parallel_seize; that is, return a bool, and have
> callers ignore the value of the out parameter (e.g. *pageno).
>
> I think _bt_parallel_advance_scan should be renamed something that
> includes the word "keys", like _bt_parallel_advance_array_keys.
>
> The hunk in indexam.c looks like a leftover that can be omitted.
>
> +/*
> + * Below flags are used to indicate the state of parallel scan.
>
> They aren't really flags any more; they're members of an enum.  I
> think you could just leave this sentence out entirely and start right
> in with descriptions of the individual values.  But maybe all of those
> descriptions should end in a period (instead of one ending in a period
> but not the other three) since they read like complete sentences.
>
> + * btinitparallelscan - Initializing BTParallelScanDesc for parallel
> btree scan
>
> Initializing -> initialize
>
> + *  btparallelrescan() -- reset parallel scan
>
> Previous two prototypes have one dash, this one has two.  Make it
> consistent, please.
>
> +     * Ideally, we don't need to acquire spinlock here, but being
> +     * consistent with heap_rescan seems to be a good idea.
>
> How about: In theory, we don't need to acquire the spinlock here,
> because there shouldn't be any other workers running at this point,
> but we do so for consistency.
>
> + * _bt_parallel_seize() -- returns the next block to be scanned for
> forward
> + *      scans and latest block scanned for backward scans.
>
> I think the description should be more like "Begin the process of
> advancing the scan to a new page.  Other scans must wait until we call
> bt_parallel_release() or bt_parallel_done()."  And likewise
> _bt_parallel_release() should say something like "Complete the process
> of advancing the scan to a new page.  We now have the new value for
> btps_scanPage; some other backend can now begin advancing the scan."
> And _bt_parallel_done should say something like "Mark the parallel
> scan as complete."
>
> I am a bit mystified about how this manages to work with array keys.
> _bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
> unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
> _bt_parallel_advance_scan() won't do anything unless btps_pageStatus
> is already BTPARALLEL_DONE.  It seems like one of those two things has
> to be wrong.
>
> _bt_readpage's header comment should be updated to note that, in the
> case of a parallel scan, _bt_parallel_seize should have been called
> prior to entering this function, and _bt_parallel_release will be
> called prior to return (or this could alternatively be phrased in
> terms of btps_pageStatus on entry/exit).
>
> _bt_readnextpage isn't very clear about the meaning of its blkno
> argument.  It looks like it always has to be valid when scanning
> forward, but only in the parallel case when scanning backward?  That
> is a little odd.  Another, possibly related thing that is odd is that
> when _bt_steppage() finds no tuples and decides to advance to a new
> page again, there's a very short comment in the forward case and a
> very long comment in the backward case:
>
>             /* nope, keep going */
> vs.
>             /*
>              * For parallel scans, get the last page scanned as it is quite
>              * possible that by the time we try to fetch previous page,
> other
>              * worker has also decided to scan that previous page.  We
> could
>              * avoid that by doing _bt_parallel_release once we have read
> the
>              * current page, but it is bad to make other workers wait till
> we
>              * read the page.
>              */
>
> Now it seems to me that these cases are symmetric and the issues are
> identical.  It's basically that, while we were judging whether the
> current page has useful contents, some other process could have
> advanced the scan (since _bt_readpage un-seized it).
>
> -            /* check for deleted page */
>              page = BufferGetPage(so->currPos.buf);
>              TestForOldSnapshot(scan->xs_snapshot, rel, page);
>              opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> +            /* check for deleted page */
>
> This is an independent change; committed.
>
> What kind of testing has been done to ensure that this doesn't have
> concurrency bugs?  What's the highest degree of parallelism that's
> been tested?  Did that test include array keys?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Reply via email to