Hi, On 2025-04-06 13:51:34 +0200, Tomas Vondra wrote: > On 4/6/25 00:29, Andres Freund wrote: > >> + > >> + if (firstNumaTouch) > >> + elog(DEBUG1, "NUMA: page-faulting the buffercache for > >> proper NUMA readouts"); > > > > Over the patchseries the related code is duplicated. Seems like it'd be good > > to put it into pg_numa instead? This seems like the thing that's good to > > abstract away in one central spot. > > > > Abstract away which part, exactly? I thought about moving some of the > code to port/pg_numa.c, but it didn't seem worth it.
The easiest would be to just have a single function that does this for the whole shared memory allocation, without having to integrate it with the per-allocation or per-buffer loop. > > > >> + /* > >> + * If we have multiple OS pages per buffer, fill those > >> in too. We > >> + * always want at least one OS page, even if there are > >> multiple > >> + * buffers per page. > >> + * > >> + * Altough we could query just once per each OS page, > >> we do it > >> + * repeatably for each Buffer and hit the same address > >> as > >> + * move_pages(2) requires page aligment. This also > >> simplifies > >> + * retrieval code later on. Also NBuffers starts from 1. > >> + */ > >> + for (j = 0; j < Max(1, pages_per_buffer); j++) > >> + { > >> + char *buffptr = (char *) BufferGetBlock(i > >> + 1); > >> + > >> + fctx->record[idx].bufferid = bufferid; > >> + fctx->record[idx].numa_page = j; > >> + > >> + os_page_ptrs[idx] > >> + = (char *) TYPEALIGN(os_page_size, > >> + > >> buffptr + (os_page_size * j)); > > > > FWIW, this bit here is the most expensive part of the function itself, as > > the > > compiler has no choice than to implement it as an actual division, as > > os_page_size is runtime variable. > > > > It'd be fine to leave it like that, the call to numa_move_pages() is way > > more > > expensive. But it shouldn't be too hard to do that alignment once, rather > > than > > having to do it over and over. > > > > Division? It's entirely possible I'm missing something obvious, but I > don't see any divisions in this code. Oops. The division was only added in a subsequent patch, not the quoted code. At the time it was: + /* calculate ID of the OS memory page */ + fctx->record[idx].numa_page + = ((char *) os_page_ptrs[idx] - startptr) / os_page_size; Greetings, Andres Freund