On Thu, May 23, 2019 at 7:30 PM Ajin Cherian <itsa...@gmail.com> wrote:

> Hi Ashwin,
>
> - how to pass the "column projection list" to table AM? (as stated in
>   initial email, currently we have modified table am API to pass the
>   projection to AM)
>
> We were working on a similar columnar storage using pluggable APIs; one
> idea that we thought of was to modify the scan slot based on the targetlist
> to have only the relevant columns in the scan descriptor. This way the
> table AMs are passed a slot with only relevant columns in the descriptor.
> Today we do something similar to the result slot using
> ExecInitResultTypeTL(), now do it to the scan tuple slot as well. So
> somewhere after creating the scan slot using ExecInitScanTupleSlot(), call
> a table am handler API to modify the scan tuple slot based on the
> targetlist, a probable name for the new table am handler would be:
> exec_init_scan_slot_tl(PlanState *planstate, TupleTableSlot *slot).
>

Interesting.

Though this reads hacky and not clean approach to me. Reasons:

- The memory allocation and initialization for slot descriptor was
  done in ExecInitScanTupleSlot().  exec_init_scan_slot_tl() would
  redo lot of work. ExecInitScanTupleSlot() ideally just points to
  tupleDesc from Relation object. But for exec_init_scan_slot_tl()
  will free the existing tupleDesc and reallocate fresh. Plus, can't
  point to Relation tuple desc but essentially need to craft one out.

- As discussed in thread [1], several places want to use different
  slots for the same scan, so that means will have to modify the
  descriptor every time on such occasions even if it remains the same
  throughout the scan. Some extra code can be added to keep around old
  tupledescriptor and then reuse for next slot, but that seems again
  added code complexity.

- AM needs to know the attnum in terms of relation's attribute number
  to scan. How would tupledesc convey that? Like TupleDescData's attrs
  currently carries info for attnum at attrs[attnum - 1]. If TupleDesc
  needs to convey random attributes to scan, seems this relationship
  has to be broken. attrs[offset] will provide info for some attribute
  in relation, means offset != (attrs->attnum + 1). Which I am not
  sure how many places in code rely on that logic to get information.

- The tupledesc provides lot of information not just attribute numbers
  to scan. Like it provides information in TupleConstr about default
  value for column. If AM layer has to modify existing slot's
  tupledesc, it would have to copy over such information as well. This
  information today is fetched using attnum as offset value in
  constr->missing array. If this information will be retained how will
  the constr array constructed? Will the array contain only values for
  columns to scan or will contain constr array as is from Relation's
  tuple descriptor as it does today. Seems will be overhead to
  construct the constr array fresh and if not constructing fresh seems
  will have mismatch between natt and array elements.

Seems with the proposed exec_init_scan_slot_tl() API, will have to
call it after beginscan and before calling getnextslot, to provide
column projection list to AM. Special dedicated API we have for
Zedstore to pass down column projection list, needs same calling
convention which is the reason I don't like it and trying to find
alternative. But at least the api we added for Zedstore seems much
simple, generic and flexible, in comparison, as lets AM decide what it
wishes to do with it. AM can fiddle with slot's TupleDescriptor if
wishes or can handle the column projection some other way.

 So this way the scan am handlers like getnextslot is passed a slot only
> having the relevant columns in the scan descriptor. One issue though is
> that the beginscan is not passed the slot, so if some memory allocation
> needs to be done based on the column list, it can't be done in beginscan.
> Let me know what you think.
>

Yes, ideally would like to see if possible having this information
available on beginscan. But if can't be then seems fine to delay such
allocations on first calls to getnextslot and friends, that's how we
do today for Zedstore.

1]
https://www.postgresql.org/message-id/20190508214627.hw7wuqwawunhynj6%40alap3.anarazel.de

Reply via email to