On Wed, Sep 9, 2020 at 5:23 PM Amit Khandekar <amitdkhan...@gmail.com> wrote: > I went ahead and tried doing this. I chose an approach where we can > return the pointer to the in-place minimal tuple data if it's a > heap/buffer/minimal tuple slot. A new function > ExecFetchSlotMinimalTupleData() returns in-place minimal tuple data. > If it's neither heap, buffer or minimal tuple, it returns a copy as > usual. The receiver should not assume the data is directly taken from > MinimalTupleData, so it should set it's t_len to the number of bytes > returned. Patch attached > (0001-Avoid-redundant-tuple-copy-while-sending-tuples-to-G.patch)
+char * +ExecFetchSlotMinimalTupleData(TupleTableSlot *slot, uint32 *len, + bool *shouldFree) Interesting approach. It's a bit of a weird interface, returning a pointer to a non-valid MinimalTuple that requires extra tweaking after you copy it to make it a valid one and that you're not allowed to tweak in-place. I'd probably make that return "const void *" just for a bit of extra documentation. I wonder if there is a way we could make "Minimal Tuples but with the length travelling separately (and perhaps chopped off?)" into a first-class concept... It's also a shame to be schlepping a bunch of padding bytes around. tuple = (MinimalTuple) data; - Assert(tuple->t_len == nbytes); + tuple->t_len = nbytes; Hmm, so you have to scribble on shared memory on the receiving side. I wondered about a couple of different ways to share the length field with the shm_mq envelope, but that all seems a bit too weird... > Thomas, I guess you had a different approach in mind when you said > about "returning either success or > hey-that-buffer's-too-small-I-need-N-bytes". But what looks clear to Yeah I tried some things like that, but I wasn't satisfied with any of them; basically the extra work involved in negotiating the size was a bit too high. On the other hand, because my interface was "please write a MinimalTuple here!", it had the option to *form* a MinimalTuple directly in place, whereas your approach can only avoid creating and destroying a temporary tuple when the source is a heap tuple. > me is that avoiding the copy shows consistent improvement of 4 to 10% > for simple parallel table scans. I tried my patch on both x86_64 and > arm64, and observed this speedup on both. I think that's a great validation of the goal but I hope we can figure out a way that avoids the temporary tuple for more cases. FWIW I saw hash self joins running a couple of percent faster with one of my abandoned patches.