On 06-01-2021 14:06, Bharath Rupireddy wrote:
On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming <l...@swarm64.com> wrote:
The main reason for me for wanting a single API is that I would like the
decision of using single or multi inserts to move to inside the tableam.
For e.g. a heap insert we might want to put the threshold at e.g. 100
rows so that the overhead of buffering the tuples is actually
compensated. For other tableam this logic might also be quite different,
and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
whether or not multi inserts should be used. Because otherwise the thing
we'll get is that there will be tableams that will ignore this flag and
do their own thing anyway. I'd rather have an API that gives all
necessary information to the tableam and then make the tableam do "the
right thing".

Another reason I'm suggesting this API is that I would expect that the
begin is called in a different place in the code for the (multiple)
inserts than the actual insert statement.
To me conceptually the begin and end are like e.g. the executor begin
and end: you prepare the inserts with the knowledge you have at that
point. I assumed (wrongly?) that during the start of the statement one
knows best how many rows are coming; and then the actual insertion of
the row doesn't have to deal anymore with multi/single inserts, choosing
when to buffer or not, because that information has already been given
during the initial phase. One of the reasons this is appealing to me is
that e.g. in [1] there was discussion on when to switch to a multi
insert state, and imo this should be up to the tableam.

Agree that whether to go with the multi or single inserts should be
completely left to tableam implementation, we, as callers of those API
just need to inform whether we expect single or multiple rows, and it
should be left to tableam implementation whether to actually go with
buffering or single inserts. ISTM that it's an elegant way of making
the API generic and abstracting everything from the callers. What I
wonder is how can we know in advance the expected row count that we
need to pass in to heap_insert_begin()? IIUC, we can not estimate the
upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some
other insert queries?  Of course, we can look at the planner's
estimated row count for the selects in COPY, Insert Into Select or
Refresh Mat View after the planning, but to me that's not something we
can depend on and pass in the row count to the insert APIs.

When we don't know the expected row count, why can't we(as callers of
the APIs) tell the APIs something like, "I'm intending to perform
multi inserts, so if possible and if you have a mechanism to buffer
the slots, do it, otherwise insert the tuples one by one, or else do
whatever you want to do with the tuples I give it you". So, in case of
COPY we can ask the API for multi inserts and call heap_insert_begin()
and heap_insert_v2().


I thought that when it is available (because of planning) it would be nice to pass it in. If you don't know you could pass in a 1 for doing single inserts, and e.g. -1 or max-int for streaming. The reason I proposed it is so that tableam's have as much knowledge as posisble to do the right thing. is_multi does also work of course but is just somewhat less informative.

What to me seemed somewhat counterintuitive is that with the proposed API it is possible to say is_multi=true and then still call heap_insert_v2 to do a single insert.

Given the above explanation, I still feel bool is_multi would suffice.

Thoughts?

On dynamically, switching from single to multi inserts, this can be
done by heap_insert_v2 itself. The way I think it's possible is that,
say we have some threshold row count 1000(can be a macro)  after
inserting those many tuples, heap_insert_v2 can switch to buffering
mode.

For that I thought it'd be good to use the expected row count, but yeah dynamically switching also works and might work better if the expected row counts are usually off.


Thoughts?

Which would make the code something like:

void
heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
{
         TupleTableSlot  *batchslot;
         HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
         Size sz;

         Assert(mistate && mistate->slots);

         if (mistate->slots[mistate->cur_slots] == NULL)
                 mistate->slots[mistate->cur_slots] =
                                                                         
table_slot_create(state->rel, NULL);

         batchslot = mistate->slots[mistate->cur_slots];

         ExecClearTuple(batchslot);
         ExecCopySlot(batchslot, slot);

         /*
          * Calculate the tuple size after the original slot is copied, because 
the
          * copied slot type and the tuple size may change.
          */
         sz = GetTupleSize(batchslot, mistate->max_size);

         Assert(sz > 0);

         mistate->cur_slots++;
         mistate->cur_size += sz;

         if (mistate->cur_slots >= mistate->max_slots ||
                 mistate->cur_size >= mistate->max_size)
                 heap_multi_insert_flush(state);
}

I think clearing tuples before copying the slot as you suggested may
work without the need of clear_slots flag.

ok, cool :)



  > Also, why do we want to do ExecClearTuple() anyway? Isn't
  > it good enough that the next call to ExecCopySlot will effectively clear
  > it out?

For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
slot before copying. But, for buffer heap slots, the
tts_buffer_heap_copyslot does not always clear the destination slot, see
below. If we fall into else condition, we might get some issues. And
also note that, once the slot is cleared in ExecClearTuple, it will not
be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
false. That is why, let's have ExecClearTuple as is.

I had no idea the buffer heap slot doesn't unconditionally clear out the
slot :( So yes lets call it unconditionally ourselves. See also
suggestion above.

Yeah, we will clear the tuple slot before copy to be on the safer side.


ok

      /*
       * If the source slot is of a different kind, or is a buffer slot
that has
       * been materialized / is virtual, make a new copy of the tuple.
Otherwise
       * make a new reference to the in-buffer tuple.
       */
      if (dstslot->tts_ops != srcslot->tts_ops ||
          TTS_SHOULDFREE(srcslot) ||
          !bsrcslot->base.tuple)
      {
          MemoryContext oldContext;

          ExecClearTuple(dstslot);
      }
      else
      {
          Assert(BufferIsValid(bsrcslot->buffer));

          tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
                                      bsrcslot->buffer, false);

  > - flushed -> why is this a stored boolean? isn't this indirectly encoded
  > by cur_slots/cur_size == 0?

Note that cur_slots is in HeapMultiInsertState and outside of the new
APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
really access the cur_slots. I mean, we can access but we need to be
dereferencing using the tableam kind. Instead of  doing all of that, to
keep the API cleaner, I chose to have a boolean in the TableInsertState
which we can see and use outside of the new APIs. Hope that's fine.

So you mean the flushed variable is actually there to tell the user of
the API that they are supposed to call flush before end? Why can't the
end call flush itself then? I guess I completely misunderstood the
purpose of table_multi_insert_flush being public. I had assumed it is
there to from the usage site indicate that now would be a good time to
flush, e.g. because of a statement ending or something. I had not
understood this is a requirement that its always required to do
table_multi_insert_flush + table_insert_end.
IMHO I would hide this from the callee, given that you would only really
call flush yourself when you immediately after would call end, or are
there other cases where one would be required to explicitly call flush?

We need to know outside the multi_insert API whether the buffered
slots in case of multi inserts are flushed. Reason is that if we have
indexes or after row triggers, currently we call ExecInsertIndexTuples
or ExecARInsertTriggers on the buffered slots outside the API in a
loop after the flush.

If we agree on removing heap_multi_insert_v2 API and embed that logic
inside heap_insert_v2, then we can do this - pass the required
information and the functions ExecInsertIndexTuples and
ExecARInsertTriggers as callbacks so that, whether or not
heap_insert_v2 choses single or multi inserts, it can callback these
functions with the required information passed after the flush. We can
add the callback and required information into TableInsertState. But,
I'm not quite sure, we would make ExecInsertIndexTuples and
ExecARInsertTriggers. And in

If we don't want to go with callback way, then at least we need to
know whether or not heap_insert_v2 has chosen multi inserts, if yes,
the buffered slots array, and the number of current buffered slots,
whether they are flushed or not in the TableInsertState. Then,
eventually, we might need all the HeapMultiInsertState info in the
TableInsertState.


To me the callback API seems cleaner, that on heap_insert_begin we can pass in a callback that is called on every flushed slot, or only on multi-insert flushes. Is there a reason it would only be done for multi-insert flushes or can it be generic?

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Hi,

Replied inline.

Kind regards,
Luc


Reply via email to