On 17.05.2018 05:19, Andres Freund wrote:
On 2018-05-16 22:11:22 -0400, Tom Lane wrote:
David Rowley <david.row...@2ndquadrant.com> writes:
On 17 May 2018 at 11:00, Andres Freund <and...@anarazel.de> wrote:
Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.
While I'm in favour of speeding that up, I think we'd get complaints
if we used a stale value.
Yeah, that scares me too.  We'd then be in a situation where (arguably)
any relation extension should force a relcache inval.  Not good.
I do not buy Andres' argument that the value is noncritical, either ---
particularly during initial population of a table, where the size could
go from zero to something-significant before autoanalyze gets around
to noticing.
I don't think every extension needs to force a relcache inval. It'd
instead be perfectly reasonable to define a rule that an inval is
triggered whenever crossing a 10% relation size boundary. Which'll lead
to invalidations for the first few pages, but much less frequently

I'm a bit skeptical of the idea of maintaining an accurate relation
size in shared memory, too.  AIUI, a lot of the problem we see with
lseek(SEEK_END) has to do with contention inside the kernel for access
to the single-point-of-truth where the file's size is kept.  Keeping
our own copy would eliminate kernel-call overhead, which can't hurt,
but it won't improve the contention angle.
A syscall is several hundred instructions. An unlocked read - which'll
be be sufficient in many cases, given that the value can quickly be out
of date anyway - is a few cycles. Even with a barrier you're talking a
few dozen cycles.  So I can't see how it'd not improve the contention.

But the main reason for keeping it in shmem is less the lseek avoidance
- although that's nice, context switches aren't great - but to make
relation extension need far less locking.


Andres Freund

I completely agree with Andreas. In my multithreaded Postgres prototype file description cache (shared by all threads) becomes bottleneck exactly because of each query execution requires access to file system (lseek) to provide optimizer estimation of the relation size, despite to the fact that all database fits in memory. Well, this is certainly specific of shared descriptor's pool in my prototype, but the fact the we have to perform lseek at each query compilation seems to be annoying in any case.

And there is really no problem that cached relation size estimation is not precise.  It really can be invalidated even if relation size is changed more than some threshold value (1Mb?) or lease time for cached value is expired. May be it is reasonable to implement specific invalidation for relation size esimation, to avoid complete invalidation and reconstruction of relation description and all dependent objects. In this case time-based invalidation seems to be the easiest choice to implement. Repeating lseek each 10 or 1 second seems to have no noticeable impact on performance and relation size can not dramatically changed during this time.

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply via email to