Hi,

On Tue, Jun 24, 2025 at 03:43:19AM +0200, Tomas Vondra wrote:
> On 6/23/25 23:47, Tomas Vondra wrote:
> > ...
> > 
> > Or maybe the 32-bit chroot on 64-bit host matters and confuses some
> > calculation.
> >
> 
> I think it's likely something like this.

I think the same.

> I noticed that if I modify
> pg_buffercache_numa_pages() to query the addresses one by one, it works.
> And when I increase the number, it stops working somewhere between 16k
> and 17k items.

Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on more
than 16 pages.

It's also confirmed by test_chunk_size.c attached:

$ gcc-11 -m32 -o test_chunk_size test_chunk_size.c
$ ./test_chunk_size
 1 pages: SUCCESS (0 errors)
 2 pages: SUCCESS (0 errors)
 3 pages: SUCCESS (0 errors)
 4 pages: SUCCESS (0 errors)
 5 pages: SUCCESS (0 errors)
 6 pages: SUCCESS (0 errors)
 7 pages: SUCCESS (0 errors)
 8 pages: SUCCESS (0 errors)
 9 pages: SUCCESS (0 errors)
10 pages: SUCCESS (0 errors)
11 pages: SUCCESS (0 errors)
12 pages: SUCCESS (0 errors)
13 pages: SUCCESS (0 errors)
14 pages: SUCCESS (0 errors)
15 pages: SUCCESS (0 errors)
16 pages: SUCCESS (0 errors)
17 pages: 1 errors
Threshold: 17 pages

No error if -m32 is not used.

> It may be a coincidence, but I suspect it's related to the sizeof(void
> *) being 8 in the kernel, but only 4 in the chroot. So the userspace
> passes an array of 4-byte items, but kernel interprets that as 8-byte
> items. That is, we call
> 
> long move_pages(int pid, unsigned long count, void *pages[.count], const
> int nodes[.count], int status[.count], int flags);
> 
> Which (I assume) just passes the parameters to kernel. And it'll
> interpret them per kernel pointer size.
>

I also suspect something in this area...

> If this is what's happening, I'm not sure what to do about it ...

We could work by chunks (16?) on 32 bits but would probably produce performance
degradation (we mention it in the doc though). Also would always 16 be a correct
chunk size? 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <errno.h>

int test_chunk_size(int chunk_size) {
    size_t page_size = sysconf(_SC_PAGESIZE);
    
    void *mem = malloc(page_size * chunk_size);
    if (!mem) return -1;
    
    memset(mem, 0xFF, page_size * chunk_size);
    
    void **ptrs = malloc(sizeof(void*) * chunk_size);
    int *status = malloc(sizeof(int) * chunk_size);
    
    for (int j = 0; j < chunk_size; j++) {
        ptrs[j] = (char*)mem + (j * page_size);
        status[j] = -999;
    }
    
    long result = syscall(SYS_move_pages, 0, chunk_size, ptrs, NULL, status, 0);
    
    int errors = 0;
    if (result == 0) {
        for (int j = 0; j < chunk_size; j++) {
            if (status[j] < 0) errors++;
        }
    }
    
    free(mem);
    free(ptrs);
    free(status);
    
    return (result == 0) ? errors : -1;
}

int main() {
    int threshold = -1;
    
    // Test sizes from 1 to 40 pages
    for (int size = 1; size <= 40; size++) {
        int errors = test_chunk_size(size);
        
        if (errors == -1) {
            if (threshold == -1) threshold = size;
            break;
        } else if (errors == 0) {
            printf("%2d pages: SUCCESS (0 errors)\n", size);
        } else {
            printf("%2d pages: %d errors\n", 
                   size, errors);
            threshold = size;
            break;
        }
    }
    
    if (threshold > 0)
        printf("Threshold: %d pages\n", threshold);
     else 
        printf("No threshold found in range 1-40 pages\n");
    
    return 0;
}

Reply via email to