From: Robert Haas <robertmh...@gmail.com>
> On Wed, Mar 24, 2021 at 12:48 AM Andres Freund <and...@anarazel.de>
> wrote:
> > Although this specific hack doesn't seem too terrible to me. If you
> > execute a parallel insert the likelihood to end up not needing an xid is
> > pretty low. Implementing it concurrently does seem like it'd end up
> > needing another lwlock nested around xid assignment, or some more
> > complicated scheme with already holding XidGenLock or retries. But maybe
> > I'm missing an easy solution here.
> 
> I don't think you need to do anything that is known outside the group
> of processes involved in the parallel query. I think you just need to
> make sure that only one of them is trying to acquire an XID at a time,
> and that all the others find out about it. I haven't thought too hard
> about the timing: if one process acquires an XID for the transaction,
> is it OK if the others do an arbitrary amount of work before they
> realize that this has happened? Also, there's the problem that the
> leader has the whole transaction stack and the workers don't, so the
> recursive nature of XID acquisition is a problem. I suspect these are
> all pretty solvable problems; I just haven't put in the energy. But,
> it could also be that I'm missing something.

It doesn't seem easy to make parallel workers allocate an XID and share it 
among the parallel processes.  When the DML is run inside a deeply nested 
subtransaction and the parent transactions have not allocated their XIDs yet, 
the worker needs to allocate the XIDs for its parents.  That indeterminate 
number of XIDs must be stored in shared memory.  The stack of TransactionState 
structures must also be passed.  Also, TransactionIdIsCurrentTransactionId() 
uses an array ParallelCurrentXids where parallel workers receive sub-committed 
XIDs from the leader.  This needs to be reconsidered.

Before that, I don't see the need for parallel workers to allocate the XID.  As 
the following Oracle manual says, parallel DML will be used in data analytics 
and OLTP batch jobs.  There should be plenty of source data in those scenarios.


When to Use Parallel DML
https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-18B2AF09-C548-48DE-A794-86224111549F
--------------------------------------------------
Several scenarios where parallel DML is used include:

Refreshing Tables in a Data Warehouse System

Creating Intermediate Summary Tables

Using Scoring Tables

Updating Historical Tables

Running Batch Jobs
--------------------------------------------------


I don't mean to say we want to use the easy hack as we want to be lazy.  I'd 
like to know whether we *really* need the effort.  And I want PostgreSQL to 
provide great competitive features as early as possible without messing up the 
design and code.

For what kind of realistic conceivable scenarios do we need the sophisticated 
XID assignment mechanism in parallel workers?


Regards
Takayuki Tsunakawa


Reply via email to