Re: [crossfire] Proposal: retire MAX_OBJECTS and object allocator
On 3/11/20 11:08 AM, Kevin Zheng wrote: On 3/10/20 9:56 PM, Mark Wedel wrote: Not sure if still the case, but at one time, performance of allocating thousands of new objects individually increased lag (when loading new maps). So doing it in blocks of 500 or whatever was faster. At one point I had put in option to do individually allocation, simply for the reason above - if using a memory debugger, that block of objects failed to work. Allocating one object at a time does make mapfile_load() slower (all numbers in us): x CF Object Allocator + 1 object per calloc() +--+ | x+ | |x xx x x*x x+ + + + + + ++| | |_A_M__| |___A_M_|| +--+ N Min MaxMedian AvgStddev x 10 16154 17814 17309.5 17148.7 579.95633 + 10 17560 20576 19169 19021.9 1033.7963 Difference at 95.0% confidence 1873.2 +/- 787.548 10.9233% +/- 4.71741% (Student's t, pooled s = 838.178) This measurement was made with n=10 loading /world/world_105_115 with a warm disk cache, with a newly started server each time so that memory fragmentation hasn't occurred. Time elapsed was measured with a monotonic clock. It looks like 1-object-per-alloc increases the median by 2 ms. For tiled world maps, at worse case 5 maps are loaded at a time, for a total additional latency of 10 ms, or about a 11% increase. I thought it was be slower, but for modern hardware, that is not very much (going back to the system that had 16 megs of memory, its CPU also run at 25 mhz I think, so ignoring architectural differences, still clearly a lot faster). Given those old system were maybe 100 times slower, that mean loading the map would take 200 ms, and given tick time hasn't really changed, that means lag. However, even in current architecture, this really isn't a problem - since objects are never freed, eventually the game gets to a point where it should enough have objects on the free list that it doesn't need to allocate objects very often. So after the server first starts up, things may be sort of slow until it sort of reaches that point. ___ crossfire mailing list crossfire@metalforge.org http://mailman.metalforge.org/mailman/listinfo/crossfire
Re: [crossfire] Proposal: retire MAX_OBJECTS and object allocator
On 3/10/20 9:56 PM, Mark Wedel wrote: > Not sure if still the case, but at one time, performance of allocating > thousands of new objects individually increased lag (when loading new > maps). So doing it in blocks of 500 or whatever was faster. At one > point I had put in option to do individually allocation, simply for the > reason above - if using a memory debugger, that block of objects failed > to work. Allocating one object at a time does make mapfile_load() slower (all numbers in us): x CF Object Allocator + 1 object per calloc() +--+ | x+ | |x xx x x*x x+ + + + + + ++| | |_A_M__| |___A_M_|| +--+ N Min MaxMedian AvgStddev x 10 16154 17814 17309.5 17148.7 579.95633 + 10 17560 20576 19169 19021.9 1033.7963 Difference at 95.0% confidence 1873.2 +/- 787.548 10.9233% +/- 4.71741% (Student's t, pooled s = 838.178) This measurement was made with n=10 loading /world/world_105_115 with a warm disk cache, with a newly started server each time so that memory fragmentation hasn't occurred. Time elapsed was measured with a monotonic clock. It looks like 1-object-per-alloc increases the median by 2 ms. For tiled world maps, at worse case 5 maps are loaded at a time, for a total additional latency of 10 ms, or about a 11% increase. -- Kevin Zheng kevinz5...@gmail.com | kev...@berkeley.edu XMPP: kev...@eecs.berkeley.edu ___ crossfire mailing list crossfire@metalforge.org http://mailman.metalforge.org/mailman/listinfo/crossfire
Re: [crossfire] Proposal: retire MAX_OBJECTS and object allocator
Thanks, Mark, for the review. On 3/10/20 9:56 PM, Mark Wedel wrote: >> The object allocator allocates OBJ_EXPAND objects at a time and manages >> its own free list. It is faster at allocating many objects at once, but >> defeats memory debuggers and mitigation techniques. It also does not >> return unused memory to the operating system. > > Not sure if still the case, but at one time, performance of allocating > thousands of new objects individually increased lag (when loading new > maps). So doing it in blocks of 500 or whatever was faster. At one > point I had put in option to do individually allocation, simply for the > reason above - if using a memory debugger, that block of objects failed > to work. I have not measured the latency loading maps. I'll measure this and hopefully get back with some results. > Minor nit - even free() on a memory object does not return it to the OS > - it tells malloc that block of memory is available again, and on a > future call, malloc will re-use it, though you can get fragmentation > (malloc may decide to use that memory for something smaller than an > object). This may be true for some allocators, but not jemalloc. Now, not all allocators are jemalloc and that might be important to consider. In jemalloc, unused dirty pages are kept around for re-allocation but eventually "returned" to the operating system via madvise(...MADV_FREE). top confirms that the server's resident memory can drop below its peak resident memory when no swapping takes place. The jemalloc statistics show the number of madvise() calls. Linux appears to have gained MADV_FREE since 4.5, but I'm not sure if the standard library allocator uses it. >> object_free2() adds freed objects to a free list that is reclaimed at >> the end of each server tick. It cannot immediately free object memory, >> because some callers expect to be able to query it for FLAG_FREED to >> detect object removal. > > querying for flag freed was always a bit buggy (it sort of relies that > the given object won't get re-used, which works because the program is > singled threaded and these freed objects will be last to be allocated > when new objects are needed again). > > Also, as noted above, since memory is not returned to the OS, only the > allocator, having crossfire itself keep a freelist makes perfect sense - > it is going to need those objects again at some point, and objects are > the main use of dynamic memory. If the program did something like was > in one state where it needed a bunch of malloc'd items, and did > something with them, no longer needed them, but then needed a large > amount of memory for a different object type, freeing them would make > sense, because in that second allocation, it could re-use that freed up > space. I suppose this boils down to how much slower calloc() is compared to grabbing from the internal freelist, and whether free() returns memory on enough supported operating systems or not. -- Kevin Zheng kevinz5...@gmail.com | kev...@berkeley.edu XMPP: kev...@eecs.berkeley.edu ___ crossfire mailing list crossfire@metalforge.org http://mailman.metalforge.org/mailman/listinfo/crossfire
Re: [crossfire] Proposal: retire MAX_OBJECTS and object allocator
On 3/9/20 11:24 PM, Kevin Zheng wrote: Hi there, After experiencing MAX_OBJECTS-related excessive swapping on Invidious that allowed Saromok's souped-up goblins to poison and kill my character, I've decided to take another look at memory management. I may be missing some historical context for why things are the way they currently are in Crossfire, so I welcome feedback on what these proposed changes might do or not do. Crossfire goes way back (30+ years at this point), and the power/abilities of those machines back then were much more limited. I think back then, the computer I was using had 16 megs of memory. As such, things like memory consumption was much more a concern, and while memory usage has gone up (more stuff stored in the object, larger maps, etc), it has not gone up at the same pace as computer power has gone up. So a lot of those old assumptions are no longer true. The rationale for each change is included in each patch, but reproduced here for convenience: Subject: [PATCH 1/2] Retire MAX_OBJECTS MAX_OBJECTS has probably outlived its usefulness. It was previously "no hard limit," only serving to trigger map swapping immediately after loading a map, if the number of used objects exceeded MAX_OBJECTS. At worse case, MAX_OBJECTS causes an O(n^2) search and O(n) swaps, where n is the number of maps in memory. This happens immediately after loading a very large map. The server takes O(n) to search for the map with the smallest remaining timeout and swaps it, and does this n times or until enough memory is freed. If the other maps are small, this does not free much memory and causes the "performance hit" mentioned in the comments. This was observed on Invidious, where the server experienced delays of up to 700 ms immediately after loading a large map due to excessive swapping. Removing MAX_OBJECTS does not significantly change the server's allocation pattern because the server never frees memory (it maintains an internal free list) and because maps are swapped out based on timeout at the end of each tick anyway. This totally makes sense - as alluded to above, at one time, memory was constrained and one wanted to make sure crossfire did not use so much memory as to cause paging on the system it was running on. In theory, MAX_OBJECTS would get tuned by each administrator, based on how much memory their server had, but I think that rarely happened. Subject: [PATCH 2/2] Retire object allocator The object allocator allocates OBJ_EXPAND objects at a time and manages its own free list. It is faster at allocating many objects at once, but defeats memory debuggers and mitigation techniques. It also does not return unused memory to the operating system. Not sure if still the case, but at one time, performance of allocating thousands of new objects individually increased lag (when loading new maps). So doing it in blocks of 500 or whatever was faster. At one point I had put in option to do individually allocation, simply for the reason above - if using a memory debugger, that block of objects failed to work. Minor nit - even free() on a memory object does not return it to the OS - it tells malloc that block of memory is available again, and on a future call, malloc will re-use it, though you can get fragmentation (malloc may decide to use that memory for something smaller than an object). Rewrite object_new() to always allocate memory using calloc(). This will incur some memory overhead since an object struct does not fit perfectly into a nice allocator slab size. I suspect additional overhead should be pretty trivial, assuming the allocator is decent (and realistically, not much an issue anyways) object_free2() adds freed objects to a free list that is reclaimed at the end of each server tick. It cannot immediately free object memory, because some callers expect to be able to query it for FLAG_FREED to detect object removal. querying for flag freed was always a bit buggy (it sort of relies that the given object won't get re-used, which works because the program is singled threaded and these freed objects will be last to be allocated when new objects are needed again). Also, as noted above, since memory is not returned to the OS, only the allocator, having crossfire itself keep a freelist makes perfect sense - it is going to need those objects again at some point, and objects are the main use of dynamic memory. If the program did something like was in one state where it needed a bunch of malloc'd items, and did something with them, no longer needed them, but then needed a large amount of memory for a different object type, freeing them would make sense, because in that second allocation, it could re-use that freed up space. ___ crossfire mailing list crossfire@metalforge.org http://mailman.metalforge.org/mailman/listinfo/crossfire
[crossfire] Proposal: retire MAX_OBJECTS and object allocator
Hi there, After experiencing MAX_OBJECTS-related excessive swapping on Invidious that allowed Saromok's souped-up goblins to poison and kill my character, I've decided to take another look at memory management. I may be missing some historical context for why things are the way they currently are in Crossfire, so I welcome feedback on what these proposed changes might do or not do. The rationale for each change is included in each patch, but reproduced here for convenience: Subject: [PATCH 1/2] Retire MAX_OBJECTS MAX_OBJECTS has probably outlived its usefulness. It was previously "no hard limit," only serving to trigger map swapping immediately after loading a map, if the number of used objects exceeded MAX_OBJECTS. At worse case, MAX_OBJECTS causes an O(n^2) search and O(n) swaps, where n is the number of maps in memory. This happens immediately after loading a very large map. The server takes O(n) to search for the map with the smallest remaining timeout and swaps it, and does this n times or until enough memory is freed. If the other maps are small, this does not free much memory and causes the "performance hit" mentioned in the comments. This was observed on Invidious, where the server experienced delays of up to 700 ms immediately after loading a large map due to excessive swapping. Removing MAX_OBJECTS does not significantly change the server's allocation pattern because the server never frees memory (it maintains an internal free list) and because maps are swapped out based on timeout at the end of each tick anyway. Subject: [PATCH 2/2] Retire object allocator The object allocator allocates OBJ_EXPAND objects at a time and manages its own free list. It is faster at allocating many objects at once, but defeats memory debuggers and mitigation techniques. It also does not return unused memory to the operating system. Rewrite object_new() to always allocate memory using calloc(). This will incur some memory overhead since an object struct does not fit perfectly into a nice allocator slab size. object_free2() adds freed objects to a free list that is reclaimed at the end of each server tick. It cannot immediately free object memory, because some callers expect to be able to query it for FLAG_FREED to detect object removal. These changes were tested locally (mostly by running a character around) with 1) clang -fsanitize=address, which did not report any errors, and 2) with Jemalloc statistics enabled (see attached, jemalloc.out). As expected, after the change, the allocator bin seeing the most activity was 768 bytes (struct object is 656 bytes on x86-64). -- Kevin Zheng kevinz5...@gmail.com | kev...@berkeley.edu XMPP: kev...@eecs.berkeley.edu From fe2480277fa6e78db591fac3787a5779cb4a01f4 Mon Sep 17 00:00:00 2001 From: Kevin Zheng Date: Mon, 9 Mar 2020 09:54:01 -0700 Subject: [PATCH 1/2] Retire MAX_OBJECTS MAX_OBJECTS has probably outlived its usefulness. It was previously "no hard limit," only serving to trigger map swapping immediately after loading a map, if the number of used objects exceeded MAX_OBJECTS. At worse case, MAX_OBJECTS causes an O(n^2) search and O(n) swaps, where n is the number of maps in memory. This happens immediately after loading a very large map. The server takes O(n) to search for the map with the smallest remaining timeout and swaps it, and does this n times or until enough memory is freed. If the other maps are small, this does not free much memory and causes the "performance hit" mentioned in the comments. This was observed on Invidious, where the server experienced delays of up to 700 ms immediately after loading a large map due to excessive swapping. Removing MAX_OBJECTS does not significantly change the server's allocation pattern because the server never frees memory (it maintains an internal free list) and because maps are swapped out based on timeout at the end of each tick anyway. --- include/config.h | 41 - include/sproto.h | 1 - server/c_misc.c | 3 +-- server/monster.c | 2 +- server/server.c | 1 - server/swap.c| 53 6 files changed, 2 insertions(+), 99 deletions(-) diff --git a/include/config.h b/include/config.h index 4eb2ef92..7752c22a 100644 --- a/include/config.h +++ b/include/config.h @@ -336,8 +336,6 @@ * DMFILE - file with dm/wizard access lists * LOGFILE - where to log if using -daemon option * MAP_ - various map timeout and swapping parameters - * MAX_OBJECTS - how many objects to keep in memory. - * MAX_OBJECTS_LWM - only swap maps out if below that value * MOTD - message of the day - printed each time someone joins the game * PERM_FILE - limit play times * SHUTDOWN - used when shutting down the server @@ -427,45 +425,6 @@ /** Default time to reset. */ #define MAP_DEFAULTRESET 7200 -/** - * MAX_OBJECTS is no hard limit. If this limit is exceeded, Crossfire - * will look for maps which are already scheldued for swap