Re: Batching in executor

Junwang Zhao Tue, 03 Feb 2026 07:55:23 -0800

On Tue, Feb 3, 2026 at 9:30 PM cca5507 <[email protected]> wrote:
>
> Hi,
>
> Some comments for v5:
>
> 0001
> ====
>
> 1) heap_begin_batch()
>
> ```
>         /* Single allocation for HeapBatch header + tupdata array */
>         alloc_size = sizeof(HeapBatch) + sizeof(HeapTupleData) * maxitems;
>         hb = palloc(alloc_size);
>         hb->tupdata = (HeapTupleData *) ((char *) hb + sizeof(HeapBatch));
> ```
>
> Do we need a MAXALIGN() here to avoid unaligned access? Something like this:


TBH I don't think this single allocation helps too much, it's not on
the hot path,
but makes the code harder to read ;(

>
> ```
>         /* Single allocation for HeapBatch header + tupdata array */
>         alloc_size = MAXALIGN(sizeof(HeapBatch)) + sizeof(HeapTupleData) * 
> maxitems;
>         hb = palloc(alloc_size);
>         hb->tupdata = (HeapTupleData *) ((char *) hb + 
> MAXALIGN(sizeof(HeapBatch)));
> ```
>
> Or how about just using zero-length array:
>
> ```
> typedef struct HeapBatch
> {
>         Buffer                  buf;
>         int                             maxitems;
>         int                             nitems;
>         HeapTupleData   tupdata[FLEXIBLE_ARRAY_MEMBER];
> } HeapBatch;
>
> // and
> hb = palloc(offsetof(HeapBatch, tupdata) + sizeof(HeapTupleData) * maxitems);
> ```
>
> 2) pgstat_count_heap_getnext_batch()
>
> ```
> #define pgstat_count_heap_getnext_batch(rel, n)                               
>           \
>         do {                                                                  
>                                                   \
>                 if (pgstat_should_count_relation(rel))                        
>                   \
>                         (rel)->pgstat_info->counts.tuples_returned += n;      
>           \
>         } while (0)
> ```
>
> "+= n" -> "+= (n)", just like pgstat_count_index_tuples().
>
> 0002
> ====
>
> 1) TupleBatchCreate()
>
> ```
>         /* Single allocation for TupleBatch + inslots + outslots arrays */
>         alloc_size = sizeof(TupleBatch) + 2 * sizeof(TupleTableSlot *) * 
> capacity;
>         b = palloc(alloc_size);
>         inslots = (TupleTableSlot **) ((char *) b + sizeof(TupleBatch));
>         outslots = (TupleTableSlot **) ((char *) b + sizeof(TupleBatch) +
>                 sizeof(TupleTableSlot *) * capacity);
> ```
>
> Do we need a MAXALIGN() here to avoid unaligned access?
>
> 2) TupleBatchReset()
>
> ```
>         for (int i = 0; i < b->maxslots; i++)
>         {
>                 ExecClearTuple(b->inslots[i]);
>                 if (drop_slots)
>                         ExecDropSingleTupleTableSlot(b->inslots[i]);
>         }
> ```
>
> ExecDropSingleTupleTableSlot() will call ExecClearTuple(), so 
> ExecClearTuple() will be
> called twice if drop_slots is true, I think we can avoid this.
>
> 3) ScanCanUseBatching()
>
> In heap_beginscan(), we may disable page-at-a-time mode:
>
> ```
>         /*
>          * Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
>          */
>         if (!(snapshot && IsMVCCSnapshot(snapshot)))
>                 scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;
> ```
>
> It seems that ScanCanUseBatching() didn't consider this.
>
> 4) struct TupleBatch
>
> ```
>         struct TupleTableSlot **inslots; /* slots for tuples read "into" 
> batch */
>         struct TupleTableSlot **outslots; /* slots for tuples going "out of"
>                                                                            * 
> batch */
>         struct TupleTableSlot **activeslots;
> ```
>
> I think we can remove the word "struct".
>
> 5) ExecScanExtendedBatchSlot()
>
> ```
>                 /* Get next input slot from current batch, or refill */
>                 if (!TupleBatchHasMore(b))
>                 {
>                         if (!accessBatchMtd(node))
>                                 return NULL;
>                 }
> ```
>
> I think we cannot just return NULL here, see comments in ExecScanExtended():
>
> ```
>                 /*
>                  * if the slot returned by the accessMtd contains NULL, then 
> it means
>                  * there is nothing more to scan so we just return an empty 
> slot,
>                  * being careful to use the projection result slot so it has 
> correct
>                  * tupleDesc.
>                  */
>                 if (TupIsNull(slot))
>                 {
>                         if (projInfo)
>                                 return 
> ExecClearTuple(projInfo->pi_state.resultslot);
>                         else
>                                 return slot;
>                 }
> ```
>
> And why not just write this function like ExecScanExtended() and 
> ExecScanFetch()?
>
> --
> Regards,
> ChangAo Chen



-- 
Regards
Junwang Zhao

Re: Batching in executor

Reply via email to