Hello Robert, >I am a bit mystified about how this manages to work with array keys. >_bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE >unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but >_bt_parallel_advance_scan() won't do anything unless btps_pageStatus >is already BTPARALLEL_DONE. It seems like one of those two things has >to be wrong.
btps_pageStatus is to be set to BTPARALLEL_DONE only by the first worker which is performing scan for latest array key and which has encountered end of scan. This is ensured by following check in _bt_parallel_done(), if (so->arrayKeyCount >= btscan->btps_arrayKeyCount && btscan->btps_pageStatus != BTPARALLEL_DONE) Thus, BTPARALLEL_DONE marks end of scan only for latest array keys. This ensures that when any worker reaches _bt_advance_array_keys() it advances latest scan which is marked by btscan->btps_arrayKeyCount only when latest scan has ended by checking if(btps_pageStatus == BTPARALLEL_DONE) in _bt_advance_array_keys(). Otherwise, the worker just advances its local so->arrayKeyCount. I hope this provides some clarification. Thank you, Rahila Syed On Wed, Feb 1, 2017 at 3:52 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Jan 24, 2017 at 4:51 PM, Robert Haas <robertmh...@gmail.com> > wrote: > > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila <amit.kapil...@gmail.com> > wrote: > >> In spite of being careful, I missed reorganizing the functions in > >> genam.h which I have done in attached patch. > > > > Cool. Committed parallel-generic-index-scan.2.patch. Thanks. > > Reviewing parallel_index_scan_v6.patch: > > I think it might be better to swap the return value and the out > parameter for _bt_parallel_seize; that is, return a bool, and have > callers ignore the value of the out parameter (e.g. *pageno). > > I think _bt_parallel_advance_scan should be renamed something that > includes the word "keys", like _bt_parallel_advance_array_keys. > > The hunk in indexam.c looks like a leftover that can be omitted. > > +/* > + * Below flags are used to indicate the state of parallel scan. > > They aren't really flags any more; they're members of an enum. I > think you could just leave this sentence out entirely and start right > in with descriptions of the individual values. But maybe all of those > descriptions should end in a period (instead of one ending in a period > but not the other three) since they read like complete sentences. > > + * btinitparallelscan - Initializing BTParallelScanDesc for parallel > btree scan > > Initializing -> initialize > > + * btparallelrescan() -- reset parallel scan > > Previous two prototypes have one dash, this one has two. Make it > consistent, please. > > + * Ideally, we don't need to acquire spinlock here, but being > + * consistent with heap_rescan seems to be a good idea. > > How about: In theory, we don't need to acquire the spinlock here, > because there shouldn't be any other workers running at this point, > but we do so for consistency. > > + * _bt_parallel_seize() -- returns the next block to be scanned for > forward > + * scans and latest block scanned for backward scans. > > I think the description should be more like "Begin the process of > advancing the scan to a new page. Other scans must wait until we call > bt_parallel_release() or bt_parallel_done()." And likewise > _bt_parallel_release() should say something like "Complete the process > of advancing the scan to a new page. We now have the new value for > btps_scanPage; some other backend can now begin advancing the scan." > And _bt_parallel_done should say something like "Mark the parallel > scan as complete." > > I am a bit mystified about how this manages to work with array keys. > _bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE > unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but > _bt_parallel_advance_scan() won't do anything unless btps_pageStatus > is already BTPARALLEL_DONE. It seems like one of those two things has > to be wrong. > > _bt_readpage's header comment should be updated to note that, in the > case of a parallel scan, _bt_parallel_seize should have been called > prior to entering this function, and _bt_parallel_release will be > called prior to return (or this could alternatively be phrased in > terms of btps_pageStatus on entry/exit). > > _bt_readnextpage isn't very clear about the meaning of its blkno > argument. It looks like it always has to be valid when scanning > forward, but only in the parallel case when scanning backward? That > is a little odd. Another, possibly related thing that is odd is that > when _bt_steppage() finds no tuples and decides to advance to a new > page again, there's a very short comment in the forward case and a > very long comment in the backward case: > > /* nope, keep going */ > vs. > /* > * For parallel scans, get the last page scanned as it is quite > * possible that by the time we try to fetch previous page, > other > * worker has also decided to scan that previous page. We > could > * avoid that by doing _bt_parallel_release once we have read > the > * current page, but it is bad to make other workers wait till > we > * read the page. > */ > > Now it seems to me that these cases are symmetric and the issues are > identical. It's basically that, while we were judging whether the > current page has useful contents, some other process could have > advanced the scan (since _bt_readpage un-seized it). > > - /* check for deleted page */ > page = BufferGetPage(so->currPos.buf); > TestForOldSnapshot(scan->xs_snapshot, rel, page); > opaque = (BTPageOpaque) PageGetSpecialPointer(page); > + /* check for deleted page */ > > This is an independent change; committed. > > What kind of testing has been done to ensure that this doesn't have > concurrency bugs? What's the highest degree of parallelism that's > been tested? Did that test include array keys? > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >