On Wed, 25 Oct 2023 at 22:48, Ashutosh Bapat
<[email protected]> wrote:
> We may save the size of data in VirtualTupleTableSlot, thus avoiding
> the first loop. I assume that when allocating
> VirtualTupleTableSlot->data, we always know what size we are
> allocating so it should be just a matter of saving it in
> VirtualTupleTableSlot->size. This should avoid the first loop in
> tts_virtual_materialize() and give some speed up. We will need a loop
> to repoint non-byval Datums. I imagine that the pointers to non-byval
> Datums can be computed as dest_slot->tts_values[natts] =
> dest_vslot->data + (src_slot->tts_values[natts] - src_vslot->data).
> This would work as long as all the non-byval datums in the source slot
> are all stored flattened in source slot's data. I am assuming that
> that would be true in a materialized virtual slot. The byval datums
> are copied as is. I think, this will avoid multiple memcpy calls, one
> per non-byval attribute and hence some speedup. I may be wrong though.
hmm, do you think it's common enough that we copy an already
materialised virtual slot?
I tried adding the following code totts_virtual_copyslot and didn't
see the NOTICE message when running each of your test queries. "make
check" also worked without anything failing after adjusting
nodeUnique.c to always use a TTSOpsVirtual slot.
+ if (srcslot->tts_ops == &TTSOpsVirtual && TTS_SHOULDFREE(srcslot))
+ elog(NOTICE, "We copied a materialized virtual slot!");
I did get a failure in postgres_fdw's tests:
server loopback options (table_name 'tab_batch_sharded_p1_remote');
insert into tab_batch_sharded select * from tab_batch_local;
+NOTICE: We copied a materialized virtual slot!
+NOTICE: We copied a materialized virtual slot!
so I think it's probably not that common that we'd gain from that optimisation.
What might buy us a bit more would be to get rid of the for loop
inside tts_virtual_copyslot() and copy the guts of
tts_virtual_materialize() into tts_virtual_copyslot() and set the
dstslot tts_isnull and tts_values arrays in the same loop that we use
to calculate the size.
I tried that in the attached patch and then tested it alongside the
patch that changes the slot type.
master = 74604a37f
1 = [1]
2 = optimize_tts_virtual_copyslot.patch
Using the script from [2] and the setup from [3] but reduced to 10k
tuples instead of 1 million.
Times the average query time in milliseconds for a 60 second pgbench run.
query master master+1 master+1+2 m/m+1 m/m+1+2
Q1 2.616 2.082 1.903 125.65%
137.47%
Q2 9.479 10.593 10.361 89.48%
91.49%
Q3 10.329 10.357 10.627 99.73%
97.20%
Q4 7.272 7.306 6.941 99.53%
104.77%
Q5 7.597 7.043 6.645 107.87%
114.33%
Q6 162.177 161.037 162.807 100.71% 99.61%
Q7 59.288 59.43 61.294 99.76%
96.73%
103.25% 105.94%
I only expect Q2 and Q3 to gain from this. Q1 shouldn't improve but
did, so the results might not be stable enough to warrant making any
decisions from.
I was uncertain if the old behaviour of when srcslot contains fewer
attributes than dstslot was intended or not. What happens there is
that we'd leave the additional old dstslot tts_values in place and
only overwrite up to srcslot->natts but then we'd go on and
materialize all the dstslot attributes. I think this might not be
needed as we do dstslot->tts_nvalid = srcdesc->natts. I suspect we may
be ok just to materialize the srcslot attributes and ignore the
previous additional dstslot attributes. Changing it to that would
make the attached patch more simple.
David
[1]
https://www.postgresql.org/message-id/attachment/151110/use_subnode_slot_type_for_nodeunique.patch
[2] https://www.postgresql.org/message-id/attachment/151342/uniquebench.sh.txt
[3]
https://www.postgresql.org/message-id/CAExHW5uhTMdkk26oJg9f2ZVufbi5J4Lquj79MdSO%2BipnGJ_muw%40mail.gmail.com
diff --git a/src/backend/executor/execTuples.c
b/src/backend/executor/execTuples.c
index 2c2712ceac..4f8c6c24c0 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -251,7 +251,10 @@ tts_virtual_materialize(TupleTableSlot *slot)
static void
tts_virtual_copyslot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
{
- TupleDesc srcdesc = srcslot->tts_tupleDescriptor;
+ TupleDesc srcdesc = srcslot->tts_tupleDescriptor;
+ TupleDesc dstdesc = dstslot->tts_tupleDescriptor;
+ Size sz = 0;
+ char *data;
Assert(srcdesc->natts <= dstslot->tts_tupleDescriptor->natts);
@@ -259,17 +262,105 @@ tts_virtual_copyslot(TupleTableSlot *dstslot,
TupleTableSlot *srcslot)
slot_getallattrs(srcslot);
- for (int natt = 0; natt < srcdesc->natts; natt++)
+ /* make sure storage doesn't depend on external memory */
+ for (int natt = 0; natt < dstdesc->natts; natt++)
{
- dstslot->tts_values[natt] = srcslot->tts_values[natt];
- dstslot->tts_isnull[natt] = srcslot->tts_isnull[natt];
+ Form_pg_attribute att = TupleDescAttr(dstdesc, natt);
+ Datum val;
+ bool isnull;
+
+ /* copy the values/isnull from the srcslot */
+ if (natt < srcdesc->natts)
+ {
+ dstslot->tts_values[natt] = val =
srcslot->tts_values[natt];
+ dstslot->tts_isnull[natt] = isnull =
srcslot->tts_isnull[natt];
+ }
+ else
+ {
+ /*
+ * when dstslot has more ntts than srcslot we must
materialize
+ * the values from dstslot
+ */
+ val = dstslot->tts_values[natt];
+ isnull = dstslot->tts_isnull[natt];
+ }
+
+ if (att->attbyval || isnull)
+ continue;
+
+ if (att->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * We want to flatten the expanded value so that the
materialized
+ * slot doesn't depend on it.
+ */
+ sz = att_align_nominal(sz, att->attalign);
+ sz += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ sz = att_align_nominal(sz, att->attalign);
+ sz = att_addlength_datum(sz, att->attlen, val);
+ }
+ }
+
+ /* do we need to materialize any columns? */
+ if (sz > 0)
+ {
+ VirtualTupleTableSlot *vdstslot = (VirtualTupleTableSlot *)
dstslot;
+
+ /* allocate memory */
+ vdstslot->data = data = MemoryContextAlloc(dstslot->tts_mcxt,
sz);
+ dstslot->tts_flags |= TTS_FLAG_SHOULDFREE;
+
+ /* and copy all attributes into the pre-allocated space */
+ for (int natt = 0; natt < dstdesc->natts; natt++)
+ {
+ Form_pg_attribute att = TupleDescAttr(dstdesc, natt);
+ Datum val;
+
+ if (att->attbyval || dstslot->tts_isnull[natt])
+ continue;
+
+ val = dstslot->tts_values[natt];
+
+ if (att->attlen == -1 &&
+
VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ Size data_length;
+
+ /*
+ * We want to flatten the expanded value so
that the
+ * materialized slot doesn't depend on it.
+ */
+ ExpandedObjectHeader *eoh = DatumGetEOHP(val);
+
+ data = (char *) att_align_nominal(data,
att->attalign);
+ data_length = EOH_get_flat_size(eoh);
+ EOH_flatten_into(eoh, data, data_length);
+
+ dstslot->tts_values[natt] =
PointerGetDatum(data);
+ data += data_length;
+ }
+ else
+ {
+ Size data_length = 0;
+
+ data = (char *) att_align_nominal(data,
att->attalign);
+ data_length = att_addlength_datum(data_length,
att->attlen,
+
val);
+
+ memcpy(data, DatumGetPointer(val), data_length);
+
+ dstslot->tts_values[natt] =
PointerGetDatum(data);
+ data += data_length;
+ }
+ }
}
dstslot->tts_nvalid = srcdesc->natts;
dstslot->tts_flags &= ~TTS_FLAG_EMPTY;
-
- /* make sure storage doesn't depend on external memory */
- tts_virtual_materialize(dstslot);
}
static HeapTuple