Hello David and Jakub,
On Mon, Mar 23, 2026 at 1:35 PM David Geier <[email protected]> wrote:

> Hi!
>
> On 23.03.2026 12:22, Jakub Wartak wrote:
> > On Sat, Mar 21, 2026 at 5:59 PM KAZAR Ayoub <[email protected]> wrote:
> >>
> >> Hello hackers,
> >>
> >> This comes from Tomas's patch idea from his website[1], i thought this
> patch makes sense to have.
> >>
> >> PostgreSQL's virtual file descriptor (VFD) maintains a
> >> per-backend cache of open file descriptors, bounded by
> >> max_files_per_process (default 1000).  When the cache is full, the
> >> least-recently-used entry is evicted so its OS fd is closed, so a new
> >> file can be opened. On the next access to that file, open() must be
> >> called again, incurring a syscall that a larger cache would have
> >> avoided.
>
> That's one use-case. The other one that I've recently come across is
> just knowing how many VFD cache entries there are in the first place.
>
> While the number of open files is bounded by max_files_per_process, the
> number of cache entries is unbounded. Large database can easily have
> hundreds of thousands of files due to our segmentation scheme.
>
> Workloads that access a big portion of these files can end up spending
> very considerable amounts of memory on the VFD cache. For example, with
> 100,000 VFD entries per backend * 80 bytes per VFD = ~7.6 MiB. With 1000
> backends that almost 10 GiB just for VFD entries; assuming that each
> backend over time accumulates that many files.
>
> A production database I looked recently had ~300,000 files and many
> thousand backends. It spent close to 30 GiBs on VFD cache.
>
> I've looked at struct vfd and some simple changes to the struct would
> already cut memory consumption in half. I can look into that.
>
> Thoughts?

Looking forward to this.

What also bothers me in that space is if a backend allocates 100K entries
in VFD cache, that cache is never shrank ever again,
the cache only grows (if it needs more than its lifetime maximum) until the
backend dies, although this is useful as entries are reused if free instead
of
allocating entries, whether a spike in files openings effects a long living
backend to keep holding a useless amount of
cache size it will need in the future, i don't imagine this to be common
though, what do you think about this issue from your experience ?

>
> In the Linux kernel each process that open some file gets its own struct
> file. sizeof(struct file) is ~200 bytes. Hence, increasing
> max_files_per_process can measurably impact memory consumption if
> changed lightheartedly. We should document that.
>
> But I guess in most cases it's rather about changing it from 1k to 2k,
> rather than changing it from 1k to 100k.
>
Indeed, I agree with this.

>
> >> AFAIK there is currently no way from inside PostgreSQL to know whether
> fd cache pressure is occurring.
> >>
> >> Implementation is trivial, because the VFD cache is strictly
> per-backend, the counters are also
> >> per-backend and require no shared memory or locking. Three macros
> (pgstat_count_vfd_hit/miss/eviction) update fields in PendingVfdCacheStats
> directly from fd.c.
> >>
> >> I find this a bit useful, I would love to hear about anyone's thoughts
> whether this is useful or not.
> >
> > Hi,
> >
> > My $0.02, for that for that to being useful it would need to allow
> viewing
> > global vfd cache picture (across all backends), not just from *current*
> backend.
> > Applicaiton wouldn't call this function anyway, because they would have
> to be
> > modified.
>
> +1
>
Would it be reasonable to have both ? I changed the way i was thinking
about it,
Its clear one would want global vfd cache behavior monitoring as its
configuration is global but its effect is mostly backend specific,
That leads me to think it could also be useful to ALSO maintain per-backend
metrics, to help identify which backends are going crazy with the VFD cache.

>
> > In order to get that you technically should collect the hits/misses in
> local
> > pending pgstat io area (see e.g. pgstat_io or simpler pgstat_bgwriter/
> > checkpointer) like you do already with PendingVfdCacheStats, but then
> copy them
> > to shared memory pgstat area (with some LWLock* protection) that would be
> > queryable.
>
> I would include here the sum of VFD cache entries across all backend and
> the total VFD cache size.
>
I'll be doing this soon.

>
> --
> David Geier
>

Regards,
Ayoub Kazar

Reply via email to