Thanks Ashutosh for reviewing and providing your comments.
On Fri, Oct 23, 2020 at 5:43 PM Ashutosh Sharma <[email protected]> wrote:
>
> Hi Vignesh,
>
> Thanks for the updated patches. Here are some more comments that I can
> find after reviewing your latest patches:
>
> +/*
> + * This structure helps in storing the common data from CopyStateData that
> are
> + * required by the workers. This information will then be allocated and
> stored
> + * into the DSM for the worker to retrieve and copy it to CopyStateData.
> + */
> +typedef struct SerializedParallelCopyState
> +{
> + /* low-level state data */
> + CopyDest copy_dest; /* type of copy source/destination */
> + int file_encoding; /* file or remote side's character encoding */
> + bool need_transcoding; /* file encoding diff from server? */
> + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
> +
> ...
> ...
> +
> + /* Working state for COPY FROM */
> + AttrNumber num_defaults;
> + Oid relid;
> +} SerializedParallelCopyState;
>
> Can the above structure not be part of the CopyStateData structure? I
> am just asking this question because all the fields present in the
> above structure are also present in the CopyStateData structure. So,
> including it in the CopyStateData structure will reduce the code
> duplication and will also make CopyStateData a bit shorter.
>
I have removed the common members from the structure, now there are no
common members between CopyStateData & the new structure. I'm using
CopyStateData to copy to/from directly in the new patch.
> --
>
> + pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
> + relid);
>
> Do we need to pass cstate->nworkers and relid to BeginParallelCopy()
> function when we are already passing cstate structure, using which
> both of these information can be retrieved ?
>
nworkers need not be passed as you have suggested but relid need to be
passed as we will be setting it to pcdata, modified nworkers as
suggested.
> --
>
> +/* DSM keys for parallel copy. */
> +#define PARALLEL_COPY_KEY_SHARED_INFO 1
> +#define PARALLEL_COPY_KEY_CSTATE 2
> +#define PARALLEL_COPY_WAL_USAGE 3
> +#define PARALLEL_COPY_BUFFER_USAGE 4
>
> DSM key names do not appear to be consistent. For shared info and
> cstate structures, the key name is prefixed with "PARALLEL_COPY_KEY",
> but for WalUsage and BufferUsage structures, it is prefixed with
> "PARALLEL_COPY". I think it would be better to make them consistent.
>
Modified as suggested
> --
>
> if (resultRelInfo->ri_TrigDesc != NULL &&
> (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
> resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
> {
> /*
> * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
> * triggers on the table. Such triggers might query the table we're
> * inserting into and act differently if the tuples that have already
> * been processed and prepared for insertion are not there.
> */
> insertMethod = CIM_SINGLE;
> }
> else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
> resultRelInfo->ri_TrigDesc->trig_insert_new_table)
> {
> /*
> * For partitioned tables we can't support multi-inserts when there
> * are any statement level insert triggers. It might be possible to
> * allow partitioned tables with such triggers in the future, but for
> * now, CopyMultiInsertInfoFlush expects that any before row insert
> * and statement level insert triggers are on the same relation.
> */
> insertMethod = CIM_SINGLE;
> }
> else if (resultRelInfo->ri_FdwRoutine != NULL ||
> cstate->volatile_defexprs)
> {
> ...
> ...
>
> I think, if possible, all these if-else checks in CopyFrom() can be
> moved to a single function which can probably be named as
> IdentifyCopyInsertMethod() and this function can be called in
> IsParallelCopyAllowed(). This will ensure that in case of Parallel
> Copy when the leader has performed all these checks, the worker won't
> do it again. I also feel that it will make the code look a bit
> cleaner.
>
In the recent patch posted we have changed it to simplify the check
for parallel copy, it is not an exact match. I feel this comment is
not applicable on the latest patch
> --
>
> +void
> +ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
> +{
> ...
> ...
> + InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
> + &walusage[ParallelWorkerNumber]);
> +
> + MemoryContextSwitchTo(oldcontext);
> + pfree(cstate);
> + return;
> +}
>
> It seems like you also need to delete the memory context
> (cstate->copycontext) here.
>
Added it.
> --
>
> +void
> +ExecBeforeStmtTrigger(CopyState cstate)
> +{
> + EState *estate = CreateExecutorState();
> + ResultRelInfo *resultRelInfo;
>
> This function has a lot of comments which have been copied as it is
> from the CopyFrom function, I think it would be good to remove those
> comments from here and mention that this code changes done in this
> function has been taken from the CopyFrom function. If any queries
> people may refer to the CopyFrom function. This will again avoid the
> unnecessary code in the patch.
>
Changed as suggested.
> --
>
> As Heikki rightly pointed out in his previous email, we need some high
> level description of how Parallel Copy works somewhere in
> copyparallel.c file. For reference, please see how a brief description
> about parallel vacuum has been added in the vacuumlazy.c file.
>
> * Lazy vacuum supports parallel execution with parallel worker processes. In
> * a parallel vacuum, we perform both index vacuum and index cleanup with
> * parallel worker processes. Individual indexes are processed by one vacuum
> ...
Added it in copyparallel.c
This is addressed in v9 patch shared at [1].
[1] -
https://www.postgresql.org/message-id/caldanm1caonkfdn6k72dsirpgqngvwxql7tjeihz58opnp9...@mail.gmail.com
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com