On 17.05.2018 05:19, Andres Freund wrote:
On 2018-05-16 22:11:22 -0400, Tom Lane wrote:
David Rowley <david.row...@2ndquadrant.com> writes:
On 17 May 2018 at 11:00, Andres Freund <and...@anarazel.de> wrote:
Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.
While I'm in favour of speeding that up, I think we'd get complaints
if we used a stale value.
Yeah, that scares me too. We'd then be in a situation where (arguably)
any relation extension should force a relcache inval. Not good.
I do not buy Andres' argument that the value is noncritical, either ---
particularly during initial population of a table, where the size could
go from zero to something-significant before autoanalyze gets around
to noticing.
I don't think every extension needs to force a relcache inval. It'd
instead be perfectly reasonable to define a rule that an inval is
triggered whenever crossing a 10% relation size boundary. Which'll lead
to invalidations for the first few pages, but much less frequently
later.
I'm a bit skeptical of the idea of maintaining an accurate relation
size in shared memory, too. AIUI, a lot of the problem we see with
lseek(SEEK_END) has to do with contention inside the kernel for access
to the single-point-of-truth where the file's size is kept. Keeping
our own copy would eliminate kernel-call overhead, which can't hurt,
but it won't improve the contention angle.
A syscall is several hundred instructions. An unlocked read - which'll
be be sufficient in many cases, given that the value can quickly be out
of date anyway - is a few cycles. Even with a barrier you're talking a
few dozen cycles. So I can't see how it'd not improve the contention.
But the main reason for keeping it in shmem is less the lseek avoidance
- although that's nice, context switches aren't great - but to make
relation extension need far less locking.
Greetings,
Andres Freund
I completely agree with Andreas. In my multithreaded Postgres prototype
file description cache (shared by all threads) becomes bottleneck
exactly because of each query execution requires
access to file system (lseek) to provide optimizer estimation of the
relation size, despite to the fact that all database fits in memory.
Well, this is certainly specific of shared descriptor's pool in my
prototype, but the fact the we have to perform lseek at each query
compilation seems to be annoying in any case.
And there is really no problem that cached relation size estimation is
not precise. It really can be invalidated even if relation size is
changed more than some threshold value (1Mb?) or lease time for cached
value is expired.
May be it is reasonable to implement specific invalidation for relation
size esimation, to avoid complete invalidation and reconstruction of
relation description and all dependent objects.
In this case time-based invalidation seems to be the easiest choice to
implement. Repeating lseek each 10 or 1 second seems to have no
noticeable impact on performance and relation size can not dramatically
changed during this time.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company