On Thu, May 23, 2019 at 7:30 PM Ajin Cherian <itsa...@gmail.com> wrote:
> Hi Ashwin, > > - how to pass the "column projection list" to table AM? (as stated in > initial email, currently we have modified table am API to pass the > projection to AM) > > We were working on a similar columnar storage using pluggable APIs; one > idea that we thought of was to modify the scan slot based on the targetlist > to have only the relevant columns in the scan descriptor. This way the > table AMs are passed a slot with only relevant columns in the descriptor. > Today we do something similar to the result slot using > ExecInitResultTypeTL(), now do it to the scan tuple slot as well. So > somewhere after creating the scan slot using ExecInitScanTupleSlot(), call > a table am handler API to modify the scan tuple slot based on the > targetlist, a probable name for the new table am handler would be: > exec_init_scan_slot_tl(PlanState *planstate, TupleTableSlot *slot). > Interesting. Though this reads hacky and not clean approach to me. Reasons: - The memory allocation and initialization for slot descriptor was done in ExecInitScanTupleSlot(). exec_init_scan_slot_tl() would redo lot of work. ExecInitScanTupleSlot() ideally just points to tupleDesc from Relation object. But for exec_init_scan_slot_tl() will free the existing tupleDesc and reallocate fresh. Plus, can't point to Relation tuple desc but essentially need to craft one out. - As discussed in thread [1], several places want to use different slots for the same scan, so that means will have to modify the descriptor every time on such occasions even if it remains the same throughout the scan. Some extra code can be added to keep around old tupledescriptor and then reuse for next slot, but that seems again added code complexity. - AM needs to know the attnum in terms of relation's attribute number to scan. How would tupledesc convey that? Like TupleDescData's attrs currently carries info for attnum at attrs[attnum - 1]. If TupleDesc needs to convey random attributes to scan, seems this relationship has to be broken. attrs[offset] will provide info for some attribute in relation, means offset != (attrs->attnum + 1). Which I am not sure how many places in code rely on that logic to get information. - The tupledesc provides lot of information not just attribute numbers to scan. Like it provides information in TupleConstr about default value for column. If AM layer has to modify existing slot's tupledesc, it would have to copy over such information as well. This information today is fetched using attnum as offset value in constr->missing array. If this information will be retained how will the constr array constructed? Will the array contain only values for columns to scan or will contain constr array as is from Relation's tuple descriptor as it does today. Seems will be overhead to construct the constr array fresh and if not constructing fresh seems will have mismatch between natt and array elements. Seems with the proposed exec_init_scan_slot_tl() API, will have to call it after beginscan and before calling getnextslot, to provide column projection list to AM. Special dedicated API we have for Zedstore to pass down column projection list, needs same calling convention which is the reason I don't like it and trying to find alternative. But at least the api we added for Zedstore seems much simple, generic and flexible, in comparison, as lets AM decide what it wishes to do with it. AM can fiddle with slot's TupleDescriptor if wishes or can handle the column projection some other way. So this way the scan am handlers like getnextslot is passed a slot only > having the relevant columns in the scan descriptor. One issue though is > that the beginscan is not passed the slot, so if some memory allocation > needs to be done based on the column list, it can't be done in beginscan. > Let me know what you think. > Yes, ideally would like to see if possible having this information available on beginscan. But if can't be then seems fine to delay such allocations on first calls to getnextslot and friends, that's how we do today for Zedstore. 1] https://www.postgresql.org/message-id/20190508214627.hw7wuqwawunhynj6%40alap3.anarazel.de