Re: New Table Access Methods for Multi and Single Inserts

Andres Freund Sat, 03 Jun 2023 15:39:22 -0700

Hi,

This patch was referenced in a discussion at pgcon, so I thought I'd give it a
look, even though Bharat said that he won't have time to drive it forward...



On 2021-04-19 10:21:36 +0530, Bharath Rupireddy wrote:
> diff --git a/src/backend/access/heap/heapam_handler.c 
> b/src/backend/access/heap/heapam_handler.c
> index bd5faf0c1f..655de8e6b7 100644
> --- a/src/backend/access/heap/heapam_handler.c
> +++ b/src/backend/access/heap/heapam_handler.c
> @@ -2558,6 +2558,11 @@ static const TableAmRoutine heapam_methods = {
>       .tuple_insert_speculative = heapam_tuple_insert_speculative,
>       .tuple_complete_speculative = heapam_tuple_complete_speculative,
>       .multi_insert = heap_multi_insert,
> +     .tuple_insert_begin = heap_insert_begin,
> +     .tuple_insert_v2 = heap_insert_v2,
> +     .multi_insert_v2 = heap_multi_insert_v2,
> +     .multi_insert_flush = heap_multi_insert_flush,
> +     .tuple_insert_end = heap_insert_end,
>       .tuple_delete = heapam_tuple_delete,
>       .tuple_update = heapam_tuple_update,
>       .tuple_lock = heapam_tuple_lock,

I don't think we should have multiple callback for the insertion APIs in
tableam.h. I think it'd be good to continue supporting the old table_*()
functions, but supporting multiple insert APIs in each AM doesn't make much
sense to me.


> +/*
> + * GetTupleSize - Compute the tuple size given a table slot.
> + *
> + * For heap tuple, buffer tuple and minimal tuple slot types return the 
> actual
> + * tuple size that exists. For virtual tuple, the size is calculated as the
> + * slot does not have the tuple size. If the computed size exceeds the given
> + * maxsize for the virtual tuple, this function exits, not investing time in
> + * further unnecessary calculation.
> + *
> + * Important Notes:
> + * 1) Size calculation code for virtual slots is being used from
> + *     tts_virtual_materialize(), hence ensure to have the same changes or 
> fixes
> + *     here and also there.
> + * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal 
> and
> + *     virtual slots. Ensure to add related code in case any new slot type is
> + *    introduced.
> + */
> +inline Size
> +GetTupleSize(TupleTableSlot *slot, Size maxsize)
> +{
> +     Size sz = 0;
> +     HeapTuple tuple = NULL;
> +
> +     if (TTS_IS_HEAPTUPLE(slot))
> +             tuple = ((HeapTupleTableSlot *) slot)->tuple;
> +     else if(TTS_IS_BUFFERTUPLE(slot))
> +             tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
> +     else if(TTS_IS_MINIMALTUPLE(slot))
> +             tuple = ((MinimalTupleTableSlot *) slot)->tuple;
> +     else if(TTS_IS_VIRTUAL(slot))

I think this embeds too much knowledge of the set of slot types in core
code. I don't see why it's needed either?


> diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> index 414b6b4d57..2a1470a7b6 100644
> --- a/src/include/access/tableam.h
> +++ b/src/include/access/tableam.h
> @@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
>       TM_IndexStatus *status;
>  } TM_IndexDeleteOp;
>
> +/* Holds table insert state. */
> +typedef struct TableInsertState

I suspect we should design it to be usable for updates and deletes in the
future, and thus name it TableModifyState.



> +{
> +     Relation        rel;
> +     /* Bulk insert state if requested, otherwise NULL. */
> +     struct BulkInsertStateData      *bistate;
> +     CommandId       cid;

Hm - I'm not sure it's a good idea to force the cid to be the same for all
inserts done via one TableInsertState.



> +     int     options;
> +     /* Below members are only used for multi inserts. */
> +     /* Array of buffered slots. */
> +     TupleTableSlot  **mi_slots;
> +     /* Number of slots that are currently buffered. */
> +     int32   mi_cur_slots;

> +     /*
> +      * Access method specific information such as parameters that are needed
> +      * for buffering and flushing decisions can go here.
> +      */
> +     void    *mistate;

I think we should instead have a generic TableModifyState, which each AM then
embeds into an AM specific AM state. Forcing two very related structs to be
allocated separately doesn't seem wise in this case.



> @@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot 
> **slots, int nslots,
>                                                                 cid, options, 
> bistate);
>  }
>
> +static inline TableInsertState*
> +table_insert_begin(Relation rel, CommandId cid, int options,
> +                                bool alloc_bistate, bool is_multi)

Why have alloc_bistate and options?


> +static inline void
> +table_insert_end(TableInsertState *state)
> +{
> +     /* Deallocate bulk insert state here, since it's AM independent. */
> +     if (state->bistate)
> +             FreeBulkInsertState(state->bistate);
> +
> +     state->rel->rd_tableam->tuple_insert_end(state);
> +}

Seems like the order in here should be swapped?


Greetings,

Andres Freund

Re: New Table Access Methods for Multi and Single Inserts

Reply via email to