Hi,

On 2025-04-06 13:51:34 +0200, Tomas Vondra wrote:
> On 4/6/25 00:29, Andres Freund wrote:
> >> +
> >> +          if (firstNumaTouch)
> >> +                  elog(DEBUG1, "NUMA: page-faulting the buffercache for 
> >> proper NUMA readouts");
> > 
> > Over the patchseries the related code is duplicated. Seems like it'd be good
> > to put it into pg_numa instead?  This seems like the thing that's good to
> > abstract away in one central spot.
> > 
> 
> Abstract away which part, exactly? I thought about moving some of the
> code to port/pg_numa.c, but it didn't seem worth it.

The easiest would be to just have a single function that does this for the
whole shared memory allocation, without having to integrate it with the
per-allocation or per-buffer loop.


> > 
> >> +                  /*
> >> +                   * If we have multiple OS pages per buffer, fill those 
> >> in too. We
> >> +                   * always want at least one OS page, even if there are 
> >> multiple
> >> +                   * buffers per page.
> >> +                   *
> >> +                   * Altough we could query just once per each OS page, 
> >> we do it
> >> +                   * repeatably for each Buffer and hit the same address 
> >> as
> >> +                   * move_pages(2) requires page aligment. This also 
> >> simplifies
> >> +                   * retrieval code later on. Also NBuffers starts from 1.
> >> +                   */
> >> +                  for (j = 0; j < Max(1, pages_per_buffer); j++)
> >> +                  {
> >> +                          char       *buffptr = (char *) BufferGetBlock(i 
> >> + 1);
> >> +
> >> +                          fctx->record[idx].bufferid = bufferid;
> >> +                          fctx->record[idx].numa_page = j;
> >> +
> >> +                          os_page_ptrs[idx]
> >> +                                  = (char *) TYPEALIGN(os_page_size,
> >> +                                                                          
> >>  buffptr + (os_page_size * j));
> > 
> > FWIW, this bit here is the most expensive part of the function itself, as 
> > the
> > compiler has no choice than to implement it as an actual division, as
> > os_page_size is runtime variable.
> > 
> > It'd be fine to leave it like that, the call to numa_move_pages() is way 
> > more
> > expensive. But it shouldn't be too hard to do that alignment once, rather 
> > than
> > having to do it over and over.
> > 
> 
> Division? It's entirely possible I'm missing something obvious, but I
> don't see any divisions in this code.

Oops. The division was only added in a subsequent patch, not the quoted
code. At the time it was:

+                               /* calculate ID of the OS memory page */
+                               fctx->record[idx].numa_page
+                                       = ((char *) os_page_ptrs[idx] - 
startptr) / os_page_size;


Greetings,

Andres Freund


Reply via email to