On Fri, Apr 4, 2025 at 8:50 AM Bertrand Drouvot <bertranddrouvot...@gmail.com> wrote: > > Hi, > > On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote: > > On 4/3/25 15:12, Jakub Wartak wrote: > > > On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra <to...@vondra.me> wrote: > > > > > >> ... > > >> > > >> So unless someone can demonstrate a use case where this would matter, > > >> I'd not worry about it too much. > > > > > > OK, fine for me - just 3 cols for pg_buffercache_numa is fine for me, > > > it's just that I don't have cycles left today and probably lack skills > > > (i've never dealt with arrays so far) thus it would be slow to get it > > > right... but I can pick up anything tomorrow morning. > > > > > > > OK, I took a stab at reworking/simplifying this the way I proposed. > > Here's v24 - needs more polishing, but hopefully enough to show what I > > had in mind. > > > > It does these changes: > > > > 1) Drops 0002 with the pg_buffercache refactoring, because the new view > > is not "extending" the existing one. > > I think that makes sense. One would just need to join on the pg_buffercache > view to get more information about the buffer if needed. > > The pg_buffercache_numa_pages() doc needs an update though as I don't think > that > "+ The <function>pg_buffercache_numa_pages()</function> provides the same > information as <function>pg_buffercache_pages()</function>" is still true. > > > 2) Reworks pg_buffercache_num to return just three columns, bufferid, > > page_num and node_id. page_num is a sequence starting from 0 for each > > buffer. > > +1 on the idea > > > 3) It now builds an array of records, with one record per buffer/page. > > > > 4) I realized we don't really need to worry about buffers_per_page very > > much, except for logging/debugging. There's always "at least one page" > > per buffer, even if an incomplete one, so we can do this: > > > > os_page_count = NBuffers * Max(1, pages_per_buffer); > > > > and then > > > > for (i = 0; i < NBuffers; i++) > > { > > for (j = 0; j < Max(1, pages_per_buffer); j++) > > That's a nice simplification as we always need to take care of at least one > page > per buffer. > > > and everything just works fine, I think. > > I think the same. > > > Opinions? I personally find this much cleaner / easier to understand. > > I agree that's easier to understand and that that looks correct. > > A few random comments: > > === 1 > > It looks like that firstNumaTouch is not set to false anymore. > > === 2 > > + pfree(os_page_status); > + pfree(os_page_ptrs); > > Not sure that's needed, we should be in a short-lived memory context here > (ExprContext or such). > > === 3 > > + ro_volatile_var = *(uint64 *)ptr > > space missing before "ptr"? >
+my feedback as I've noticed that Bertrand already provided a review. Right, the code is now simple , and that Max() is brilliant. I've attached some review findings as .txt 0001 100%LGTM 0002 doc fix + pgident + Tomas, you should take Authored-by yourself there for sure, I couldn't pull this off alone in time! So big thanks! 0003 fixes elog UINT64_FORMAT for ming32 (a little bit funny to have NUMA on ming32...:)) When started with interleave=all on serious hardware, I'm getting (~5s for s_b=64GB) from pg_buffercache_numa node_id | count ---------+--------- 3 | 2097152 0 | 2097152 2 | 2097152 1 | 2097152 so this is valid result (2097152*4 numa nodes*8192 buffer size/1024/1024/1024 = 64GB) Also with pgbench -i -s 20 , after ~8s: select c.relname, n.node_id, count(*) from pg_buffercache_numa n join pg_buffercache b on (b.bufferid = n.bufferid) join pg_class c on (c.relfilenode = b.relfilenode) group by c.relname, n.node_id order by count(*) desc; relname | node_id | count -----------------------------------------------+---------+------- pgbench_accounts | 2 | 8217 pgbench_accounts | 0 | 8190 pgbench_accounts | 3 | 8189 pgbench_accounts | 1 | 8187 pg_statistic | 2 | 32 pg_operator | 2 | 14 pg_depend | 3 | 14 [..] pg_shm_allocations_numa also looks good. I think v24+tiny fixes is good enough to go in. -J.
diff --git a/doc/src/sgml/pgbuffercache.sgml b/doc/src/sgml/pgbuffercache.sgml index 59dbbd2b25e..3d9032efafb 100644 --- a/doc/src/sgml/pgbuffercache.sgml +++ b/doc/src/sgml/pgbuffercache.sgml @@ -45,9 +45,10 @@ </para> <para> - The <function>pg_buffercache_numa_pages()</function> provides the same information - as <function>pg_buffercache_pages()</function> but is slower because it also - provides the <acronym>NUMA</acronym> node ID per shared buffer entry. + The <function>pg_buffercache_numa_pages()</function> provides + <acronym>NUMA</acronym> node mappings for shared buffer entries. This + information is not part of <function>pg_buffercache_pages()</function> + itself, as it is much slower to retrieve. The <structname>pg_buffercache_numa</structname> view wraps the function for convenient use. </para>
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c index d653f4af394..5526dee7171 100644 --- a/contrib/pg_buffercache/pg_buffercache_pages.c +++ b/contrib/pg_buffercache/pg_buffercache_pages.c @@ -67,7 +67,7 @@ typedef struct uint32 bufferid; int32 numa_page; int32 numa_node; -} BufferCacheNumaRec; +} BufferCacheNumaRec; /* * Function context for data persisting over repeated calls. @@ -79,7 +79,7 @@ typedef struct int pages_per_buffer; int os_page_size; BufferCacheNumaRec *record; -} BufferCacheNumaContext; +} BufferCacheNumaContext; /* @@ -454,7 +454,7 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS) */ for (j = 0; j < Max(1, pages_per_buffer); j++) { - char *buffptr = (char *) BufferGetBlock(i + 1); + char *buffptr = (char *) BufferGetBlock(i + 1); fctx->record[idx].bufferid = bufferid; fctx->record[idx].numa_page = j; @@ -480,8 +480,8 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS) elog(ERROR, "failed NUMA pages inquiry: %m"); /* - * Update the entries with NUMA node ID. The status array is indexed the - * same way as the entry index. + * Update the entries with NUMA node ID. The status array is indexed + * the same way as the entry index. */ for (i = 0; i < os_page_count; i++) {
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c index 4313e6db62c..e26af975a7d 100644 --- a/src/backend/storage/ipc/shmem.c +++ b/src/backend/storage/ipc/shmem.c @@ -677,7 +677,7 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS) if (s >= 0 && s <= max_nodes) nodes[s]++; else - elog(ERROR, "invalid NUMA node id outside of allowed range [0, %ld]: %d", max_nodes, s); + elog(ERROR, "invalid NUMA node id outside of allowed range [0, " UINT64_FORMAT "]: %d", max_nodes, s); } for (i = 0; i <= max_nodes; i++)