I lack time to give this a solid review, but here's a preliminary reaction:

On Sun, Apr 20, 2014 at 03:38:23PM -0400, Tom Lane wrote:
> On HEAD, this takes about 295-300 msec on my machine in a non-cassert
> build.  With the patch I sent previously, the time goes to 495-500 msec.

> This goes from circa 1250 ms in HEAD to around 1680 ms.
> Some poking around with oprofile says that much of the added time is
> coming from added syscache and typcache lookups, as I suspected.
> I did some further work on the patch to make it possible to cache
> the element tuple descriptor across calls for these two functions.
> With the attached patch, the first test case comes down to about 335 ms
> and the second to about 1475 ms (plpgsql is still leaving some extra
> cache lookups on the table).  More could be done with some further API
> changes, but I think this is about as much as is safe to back-patch.

That particular performance drop seems acceptable.  As I hinted upthread, the
large performance risk arises for test cases that do not visit a toast table
today but will do so after the change.

> At least in the accumArrayResult test case, it would be possible to
> bring the runtime back to very close to HEAD if we were willing to
> abandon the principle that compressed fields within a tuple should
> always get decompressed when the tuple goes into a larger structure.
> That would allow flatten_composite_element to quit early if the
> tuple doesn't have the HEAP_HASEXTERNAL infomask bit set.  I'm not
> sure that this would be a good tradeoff however: if we fail to remove nested
> compression, we could end up paying for that with double compression.
> On the other hand, it's unclear that that case would come up often,
> while the overhead of disassembling the tuple is definitely going to
> happen every time we assign to an array-of-composites element.  (Also,
> there is no question here of regression relative to previous releases,
> since the existing code fails to prevent nested compression.)

I wonder how it would work out to instead delay this new detoast effort until
toast_insert_or_update().  That timing has the major advantage of not adding
any toast table visits beyond those actually needed to avoid improperly
writing an embedded toast pointer to disk.  It would give us the flexibility
to detoast array elements more lazily than we do today, though I don't propose
any immediate change there.  To reduce syscache traffic, make the TupleDesc
record whether any column requires recursive detoast work.  Perhaps also use
the TupleDesc as a place to cache the recursive-detoast syscache lookups for
tables that do need them.  Thoughts?

Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to