Re: [HACKERS] Parallel Index Scans

Robert Haas Fri, 20 Jan 2017 11:46:49 -0800

On Fri, Jan 20, 2017 at 9:29 AM, Amit Kapila <[email protected]> wrote:
> Sure, if scan keys have changed then we can't continue, but this is
> the case where runtime keys are first time initialized.
>
> if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
>
> In the above check, the second part of the check
> (!node->iss_RuntimeKeysReady) ensures that it is for the first time.
> Now, let me give you an example to explain what bad can happen if we
> allow resetting parallel scan in this case.  Consider a query like
> select * from t1 where c1 < parallel_index(10);, in this if we allow
> resetting parallel scan descriptor during first time initialization of
> runtime keys, it can easily corrupt the parallel scan state.  Suppose
> leader has taken the lead and is scanning some page and worker reaches
> to initialize its keys in ExecReScanIndexScan(), if worker resets the
> parallel scan, then it will corrupt the state of the parallel scan
> state.


Hmm, I see.  So the problem if I understand it correctly is that every
participating process needs to update the backend-private state for
the runtime keys and only one of those processes can update the shared
state.  But in the case of a "real" rescan, even the shared state
needs to be reset.  OK, makes sense.

Why does btparallelrescan cater to the case where scan->parallel_scan
== NULL?  I would assume it should never get called in that case.
Also, I think ExecReScanIndexScan needs significantly better comments.
After some thought I see what's it's doing - mostly anyway - but I was
quite confused at first.  I still don't completely understand why it
needs this if-test:

+       /* reset (parallel) index scan */
+       if (node->iss_ScanDesc)
+       {

>> I extracted the generic portions of this infrastructure (i.e. not the
>> btree-specific stuff) and spent some time working on it today.  The
>> big thing I fixed was the documentation, which you added in a fairly
>> illogical part of the file.
>
> Hmm, it is not illogical. All the functions are described in the same
> order as they are declared in IndexAmRoutine structure and I have
> followed the same.

I see.  Sorry, I didn't realize that was what you were going for.

> I think both amestimateparallelscan and
> aminitparallelscan should be added one para down which says (The
> purpose of an index .. The scan-related functions that an index access
> method must or may provide are:).

I think it's a good idea to put all three of those functions together
in the listing, similar to what we did in
69d34408e5e7adcef8ef2f4e9c4f2919637e9a06 for FDWs.  After all they are
closely related in purpose, and it may be easiest to understand if
they are next to each other in the listing.  I suggest that we move
them to the end in IndexAmRoutine similar to the way FdwRoutine was
done; in other words, my idea is to make the structure consistent with
the way that I revised the documentation instead of making the
documentation consistent with the order you picked for the structure
members.  What I like about that is that it gives a good opportunity
to include some general remarks on parallel index scans in a central
place, as I did in that patch.  Also, it makes it easier for people
who care about parallel index scans to find all of the related things
(since they are together) and for people who don't care about them to
ignore it all (for the same reason).  What do you think about that
approach?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

Reply via email to