On 3/1/23 18:48, Jehan-Guillaume de Rorthais wrote: > Hi Tomas, > > On Tue, 28 Feb 2023 20:51:02 +0100 > Tomas Vondra <tomas.von...@enterprisedb.com> wrote: >> On 2/28/23 19:06, Jehan-Guillaume de Rorthais wrote: >>> * HashBatchContext goes up to 1441MB after 240s then stay flat until the end >>> (400s as the last record) >> >> That's interesting. We're using HashBatchContext for very few things, so >> what could it consume so much memory? But e.g. the number of buckets >> should be limited by work_mem, so how could it get to 1.4GB? >> >> Can you break at ExecHashIncreaseNumBatches/ExecHashIncreaseNumBuckets >> and print how many batches/butches are there? > > I did this test this morning. > > Batches and buckets increased really quickly to 1048576/1048576. >
OK. I think 1M buckets is mostly expected for work_mem=64MB. It means buckets will use 8MB, which leaves ~56B per tuple (we're aiming for fillfactor 1.0). But 1M batches? I guess that could be problematic. It doesn't seem like much, but we need 1M files on each side - 1M for the hash table, 1M for the outer relation. That's 16MB of pointers, but the files are BufFile and we keep 8kB buffer for each of them. That's ~16GB right there :-( In practice it probably won't be that bad, because not all files will be allocated/opened concurrently (especially if this is due to many tuples having the same value). Assuming that's what's happening here, ofc. > ExecHashIncreaseNumBatches was really chatty, having hundreds of thousands of > calls, always short-cut'ed to 1048576, I guess because of the conditional > block > «/* safety check to avoid overflow */» appearing early in this function. > Hmmm, that's a bit weird, no? I mean, the check is /* safety check to avoid overflow */ if (oldnbatch > Min(INT_MAX / 2, MaxAllocSize / (sizeof(void *) * 2))) return; Why would it stop at 1048576? It certainly is not higher than INT_MAX/2 and with MaxAllocSize = ~1GB the second value should be ~33M. So what's happening here? > I disabled the breakpoint on ExecHashIncreaseNumBatches a few time to make the > query run faster. Enabling it at 19.1GB of memory consumption, it stayed > silent till the memory exhaustion (around 21 or 22GB, I don't remember > exactly). > > The breakpoint on ExecHashIncreaseNumBuckets triggered some times at > beginning, > and a last time close to the end of the query execution. > >>> Any idea? How could I help to have a better idea if a leak is actually >>> occurring and where exactly? >> >> Investigating memory leaks is tough, especially for generic memory >> contexts like ExecutorState :-( Even more so when you can't reproduce it >> on a machine with custom builds. >> >> What I'd try is this: >> >> 1) attach breakpoints to all returns in AllocSetAlloc(), printing the >> pointer and size for ExecutorState context, so something like >> >> break aset.c:783 if strcmp("ExecutorState",context->header.name) == 0 >> commands >> print MemoryChunkGetPointer(chunk) size >> cont >> end >> >> 2) do the same for AllocSetFree() >> >> 3) Match the palloc/pfree calls (using the pointer address), to >> determine which ones are not freed and do some stats on the size. >> Usually there's only a couple distinct sizes that account for most of >> the leaked memory. > > So here is what I end up with this afternoon, using file, lines and macro from > REL_11_18: > > set logging on > set pagination off > > break aset.c:781 if strcmp("ExecutorState",context.name) == 0 > commands 1 > print (((char *)(chunk)) + sizeof(struct AllocChunkData)) > print chunk->size > cont > end > > break aset.c:820 if strcmp("ExecutorState",context.name) == 0 > commands 2 > print (((char *)(chunk)) + sizeof(struct AllocChunkData)) > print chunk->size > cont > end > > break aset.c:979 if strcmp("ExecutorState",context.name) == 0 > commands 3 > print (((char *)(chunk)) + sizeof(struct AllocChunkData)) > print chunk->size > cont > end > > break AllocSetFree if strcmp("ExecutorState",context.name) == 0 > commands 4 > print pointer > cont > end > > So far, gdb had more than 3h of CPU time and is eating 2.4GB of memory. The > backend had only 3'32" of CPU time: > > VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 2727284 2.4g 17840 R 99.0 7.7 181:25.07 gdb > 9054688 220648 103056 t 1.3 0.7 3:32.05 postmaster > > Interestingly, the RES memory of the backend did not explode yet, but VIRT is > already high. > > I suppose the query will run for some more hours, hopefully, gdb will not > exhaust the memory in the meantime... > > You'll find some intermediate stats I already collected in attachment: > > * break 1, 2 and 3 are from AllocSetAlloc, break 4 is from AllocSetFree. > * most of the non-free'd chunk are allocated since the very beginning, before > the 5000's allocation call for almost 1M call so far. > * 3754 of them have a chunk->size of 0 > * it seems there's some buggy stats or data: > # this one actually really comes from the gdb log > 0x38a77b8: break=3 num=191 sz=4711441762604810240 (weird sz) > # this one might be a bug in my script > 0x2: break=2 num=945346 sz=2 (weird address) > * ignoring the weird size requested during the 191st call, the total amount > of non free'd memory is currently 5488MB > > I couldn't print "size" as it is optimzed away, that's why I tracked > chunk->size... Is there anything wrong with my current run and gdb log? > There's definitely something wrong. The size should not be 0, and neither it should be > 1GB. I suspect it's because some of the variables get optimized out, and gdb just uses some nonsense :-( I guess you'll need to debug the individual breakpoints, and see what's available. It probably depends on the compiler version, etc. For example I don't see the "chunk" for breakpoint 3, but "chunk_size" works and I can print the chunk pointer with a bit of arithmetics: p (block->freeptr - chunk_size) I suppose similar gympastics could work for the other breakpoints. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company