On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <and...@anarazel.de> wrote:
> Hi, > > Thanks for the patches! > > On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote: > > I found couple of places where the zheap is using some extra logic in > > verifying > > whether it is zheap AM or not, based on that it used to took some extra > > decisions. > > I am analyzing all the extra code that is done, whether any callbacks can > > handle it > > or not? and how? I can come back with more details later. > > Yea, I think some of them will need to stay (particularly around > integrating undo) and som other ones we'll need to abstract. > OK. I will list all the areas that I found with my observation of how to abstract or leaving it and then implement around it. > > >> And then: > > >> - lotsa cleanups > > >> - rebasing onto a newer version of the abstract slot patchset > > >> - splitting out smaller patches > > >> > > >> > > >> You'd moved the bulk insert into tableam callbacks - I don't quite get > > >> why? There's not really anything AM specific in that code? > > >> > > > > > > The main reason of adding them to AM is just to provide a control to > > > the specific AM to decide whether they can support the bulk insert or > > > not? > > > > > > Current framework doesn't support AM specific bulk insert state to be > > > passed from one function to another and it's structure is fixed. This > needs > > > to be enhanced to add AM specific private members also. > > > > > > > Do you want me to work on it to make it generic to AM methods to extend > > the structure? > > I think the best thing here would be to *remove* all AM abstraction for > bulk insert, until it's actuallly needed. The likelihood of us getting > the interface right and useful without an actual user seems low. Also, > this already is a huge patch... > OK. Will remove them and share the patch. > > > @@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, > EState *estate, > > CommandId mycid, int hi_options, > > ResultRelInfo *resultRelInfo, > > BulkInsertState bistate, > > - int nBufferedTuples, > TupleTableSlot **bufferedSlots, > > + int nBufferedSlots, TupleTableSlot > **bufferedSlots, > > uint64 firstBufferedLineNo); > > static bool CopyReadLine(CopyState cstate); > > static bool CopyReadLineText(CopyState cstate); > > @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate) > > void *bistate; > > uint64 processed = 0; > > bool useHeapMultiInsert; > > - int nBufferedTuples = 0; > > + int nBufferedSlots = 0; > > int prev_leaf_part_index = -1; > > > -#define MAX_BUFFERED_TUPLES 1000 > > +#define MAX_BUFFERED_SLOTS 1000 > > What's the point of these renames? We're still dealing in tuples. Just > seems to make the patch larger. > OK. I will correct it. > > > if (useHeapMultiInsert) > > { > > + int tup_size; > > + > > /* Add this tuple to the tuple > buffer */ > > - if (nBufferedTuples == 0) > > + if (nBufferedSlots == 0) > > + { > > firstBufferedLineNo = > cstate->cur_lineno; > > - > Assert(bufferedSlots[nBufferedTuples] == myslot); > > - nBufferedTuples++; > > + > > + /* > > + * Find out the Tuple size > of the first tuple in a batch and > > + * use it for the rest > tuples in a batch. There may be scenarios > > + * where the first tuple > is very small and rest can be large, but > > + * that's rare and this > should work for majority of the scenarios. > > + */ > > + tup_size = > heap_compute_data_size(myslot->tts_tupleDescriptor, > > + > myslot->tts_values, > > + > myslot->tts_isnull); > > + } > > This seems too exensive to me. I think it'd be better if we instead > used the amount of input data consumed for the tuple as a proxy. Does that > sound reasonable? > Yes, the cstate structure contains the line_buf member that holds the information of the line length of the row, this can be used as a tuple length to limit the size usage. comments? Regards, Haribabu Kommi Fujitsu Australia