Re: [crossfire] Proposal: retire MAX_OBJECTS and object allocator

2020-03-11 Thread Mark Wedel

On 3/11/20 11:08 AM, Kevin Zheng wrote:

On 3/10/20 9:56 PM, Mark Wedel wrote:

  Not sure if still the case, but at one time, performance of allocating
thousands of new objects individually increased lag (when loading new
maps).  So doing it in blocks of 500 or whatever was faster.  At one
point I had put in option to do individually allocation, simply for the
reason above - if using a memory debugger, that block of objects failed
to work.


Allocating one object at a time does make mapfile_load() slower (all
numbers in us):

x CF Object Allocator
+ 1 object per calloc()
+--+
|  x+  |
|x xx  x  x*x  x+   +  + + + +   ++|
|  |_A_M__|   |___A_M_||
+--+
 N   Min   MaxMedian   AvgStddev
x  10 16154 17814   17309.5   17148.7 579.95633
+  10 17560 20576 19169   19021.9 1033.7963
Difference at 95.0% confidence
 1873.2 +/- 787.548
 10.9233% +/- 4.71741%
 (Student's t, pooled s = 838.178)

This measurement was made with n=10 loading /world/world_105_115 with a
warm disk cache, with a newly started server each time so that memory
fragmentation hasn't occurred. Time elapsed was measured with a
monotonic clock.

It looks like 1-object-per-alloc increases the median by 2 ms. For tiled
world maps, at worse case 5 maps are loaded at a time, for a total
additional latency of 10 ms, or about a 11% increase.


 I thought it was be slower, but for modern hardware, that is not very much 
(going back to the system that had 16 megs of memory, its CPU also run at 25 
mhz I think, so ignoring architectural differences, still clearly a lot 
faster).  Given those old system were maybe 100 times slower, that mean loading 
the map would take 200 ms, and given tick time hasn't really changed, that 
means lag.

 However, even in current architecture, this really isn't a problem - since 
objects are never freed, eventually the game gets to a point where it should 
enough have objects on the free list that it doesn't need to allocate objects 
very often.  So after the server first starts up, things may be sort of slow 
until it sort of reaches that point.

___
crossfire mailing list
crossfire@metalforge.org
http://mailman.metalforge.org/mailman/listinfo/crossfire


Re: [crossfire] Proposal: retire MAX_OBJECTS and object allocator

2020-03-11 Thread Kevin Zheng
On 3/10/20 9:56 PM, Mark Wedel wrote:
>  Not sure if still the case, but at one time, performance of allocating
> thousands of new objects individually increased lag (when loading new
> maps).  So doing it in blocks of 500 or whatever was faster.  At one
> point I had put in option to do individually allocation, simply for the
> reason above - if using a memory debugger, that block of objects failed
> to work.

Allocating one object at a time does make mapfile_load() slower (all
numbers in us):

x CF Object Allocator
+ 1 object per calloc()
+--+
|  x+  |
|x xx  x  x*x  x+   +  + + + +   ++|
|  |_A_M__|   |___A_M_||
+--+
N   Min   MaxMedian   AvgStddev
x  10 16154 17814   17309.5   17148.7 579.95633
+  10 17560 20576 19169   19021.9 1033.7963
Difference at 95.0% confidence
1873.2 +/- 787.548
10.9233% +/- 4.71741%
(Student's t, pooled s = 838.178)

This measurement was made with n=10 loading /world/world_105_115 with a
warm disk cache, with a newly started server each time so that memory
fragmentation hasn't occurred. Time elapsed was measured with a
monotonic clock.

It looks like 1-object-per-alloc increases the median by 2 ms. For tiled
world maps, at worse case 5 maps are loaded at a time, for a total
additional latency of 10 ms, or about a 11% increase.

-- 
Kevin Zheng
kevinz5...@gmail.com | kev...@berkeley.edu
XMPP: kev...@eecs.berkeley.edu
___
crossfire mailing list
crossfire@metalforge.org
http://mailman.metalforge.org/mailman/listinfo/crossfire


Re: [crossfire] Proposal: retire MAX_OBJECTS and object allocator

2020-03-10 Thread Kevin Zheng
Thanks, Mark, for the review.

On 3/10/20 9:56 PM, Mark Wedel wrote:
>> The object allocator allocates OBJ_EXPAND objects at a time and manages
>> its own free list. It is faster at allocating many objects at once, but
>> defeats memory debuggers and mitigation techniques. It also does not
>> return unused memory to the operating system.
> 
>  Not sure if still the case, but at one time, performance of allocating
> thousands of new objects individually increased lag (when loading new
> maps).  So doing it in blocks of 500 or whatever was faster.  At one
> point I had put in option to do individually allocation, simply for the
> reason above - if using a memory debugger, that block of objects failed
> to work.

I have not measured the latency loading maps. I'll measure this and
hopefully get back with some results.

>  Minor nit - even free() on a memory object does not return it to the OS
> - it tells malloc that block of memory is available again, and on a
> future call, malloc will re-use it, though you can get fragmentation
> (malloc may decide to use that memory for something smaller than an
> object).
This may be true for some allocators, but not jemalloc. Now, not all
allocators are jemalloc and that might be important to consider.

In jemalloc, unused dirty pages are kept around for re-allocation but
eventually "returned" to the operating system via madvise(...MADV_FREE).
top confirms that the server's resident memory can drop below its peak
resident memory when no swapping takes place. The jemalloc statistics
show the number of madvise() calls.

Linux appears to have gained MADV_FREE since 4.5, but I'm not sure if
the standard library allocator uses it.

>> object_free2() adds freed objects to a free list that is reclaimed at
>> the end of each server tick. It cannot immediately free object memory,
>> because some callers expect to be able to query it for FLAG_FREED to
>> detect object removal.
> 
>  querying for flag freed was always a bit buggy (it sort of relies that
> the given object won't get re-used, which works because the program is
> singled threaded and these freed objects will be last to be allocated
> when new objects are needed again).
> 
>  Also, as noted above, since memory is not returned to the OS, only the
> allocator, having crossfire itself keep a freelist makes perfect sense -
> it is going to need those objects again at some point, and objects are
> the main use of dynamic memory.   If the program did something like was
> in one state where it needed a bunch of malloc'd items, and did
> something with them, no longer needed them, but then needed a large
> amount of memory for a different object type, freeing them would make
> sense, because in that second allocation, it could re-use that freed up
> space.

I suppose this boils down to how much slower calloc() is compared to
grabbing from the internal freelist, and whether free() returns memory
on enough supported operating systems or not.

-- 
Kevin Zheng
kevinz5...@gmail.com | kev...@berkeley.edu
XMPP: kev...@eecs.berkeley.edu
___
crossfire mailing list
crossfire@metalforge.org
http://mailman.metalforge.org/mailman/listinfo/crossfire


Re: [crossfire] Proposal: retire MAX_OBJECTS and object allocator

2020-03-10 Thread Mark Wedel

On 3/9/20 11:24 PM, Kevin Zheng wrote:

Hi there,

After experiencing MAX_OBJECTS-related excessive swapping on Invidious
that allowed Saromok's souped-up goblins to poison and kill my
character, I've decided to take another look at memory management.

I may be missing some historical context for why things are the way they
currently are in Crossfire, so I welcome feedback on what these proposed
changes might do or not do.


 Crossfire goes way back (30+ years at this point), and the power/abilities of 
those machines back then were much more limited.  I think back then, the 
computer I was using had 16 megs of memory.  As such, things like memory 
consumption was much more a concern, and while memory usage has gone up (more 
stuff stored in the object, larger maps, etc), it has not gone up at the same 
pace as computer power has gone up.  So a lot of those old assumptions are no 
longer true.



The rationale for each change is included in each patch, but reproduced
here for convenience:


Subject: [PATCH 1/2] Retire MAX_OBJECTS

MAX_OBJECTS has probably outlived its usefulness. It was previously "no
hard limit," only serving to trigger map swapping immediately after
loading a map, if the number of used objects exceeded MAX_OBJECTS.

At worse case, MAX_OBJECTS causes an O(n^2) search and O(n) swaps, where
n is the number of maps in memory. This happens immediately after
loading a very large map. The server takes O(n) to search for the map
with the smallest remaining timeout and swaps it, and does this n times
or until enough memory is freed. If the other maps are small, this does
not free much memory and causes the "performance hit" mentioned in the
comments. This was observed on Invidious, where the server experienced
delays of up to 700 ms immediately after loading a large map due to
excessive swapping.

Removing MAX_OBJECTS does not significantly change the server's
allocation pattern because the server never frees memory (it maintains
an internal free list) and because maps are swapped out based on timeout
at the end of each tick anyway.


 This totally makes sense - as alluded to above, at one time, memory was 
constrained and one wanted to make sure crossfire did not use so much memory as 
to cause paging on the system it was running on.  In theory, MAX_OBJECTS would 
get tuned by each administrator, based on how much memory their server had, but 
I think that rarely happened.




Subject: [PATCH 2/2] Retire object allocator

The object allocator allocates OBJ_EXPAND objects at a time and manages
its own free list. It is faster at allocating many objects at once, but
defeats memory debuggers and mitigation techniques. It also does not
return unused memory to the operating system.


 Not sure if still the case, but at one time, performance of allocating 
thousands of new objects individually increased lag (when loading new maps).  
So doing it in blocks of 500 or whatever was faster.  At one point I had put in 
option to do individually allocation, simply for the reason above - if using a 
memory debugger, that block of objects failed to work.

 Minor nit - even free() on a memory object does not return it to the OS - it 
tells malloc that block of memory is available again, and on a future call, 
malloc will re-use it, though you can get fragmentation (malloc may decide to 
use that memory for something smaller than an object).



Rewrite object_new() to always allocate memory using calloc(). This will
incur some memory overhead since an object struct does not fit perfectly
into a nice allocator slab size.


 I suspect additional overhead should be pretty trivial, assuming the allocator 
is decent (and realistically, not much an issue anyways)



object_free2() adds freed objects to a free list that is reclaimed at
the end of each server tick. It cannot immediately free object memory,
because some callers expect to be able to query it for FLAG_FREED to
detect object removal.


 querying for flag freed was always a bit buggy (it sort of relies that the 
given object won't get re-used, which works because the program is singled 
threaded and these freed objects will be last to be allocated when new objects 
are needed again).

 Also, as noted above, since memory is not returned to the OS, only the 
allocator, having crossfire itself keep a freelist makes perfect sense - it is 
going to need those objects again at some point, and objects are the main use 
of dynamic memory.   If the program did something like was in one state where 
it needed a bunch of malloc'd items, and did something with them, no longer 
needed them, but then needed a large amount of memory for a different object 
type, freeing them would make sense, because in that second allocation, it 
could re-use that freed up space.

___
crossfire mailing list
crossfire@metalforge.org
http://mailman.metalforge.org/mailman/listinfo/crossfire


[crossfire] Proposal: retire MAX_OBJECTS and object allocator

2020-03-09 Thread Kevin Zheng
Hi there,

After experiencing MAX_OBJECTS-related excessive swapping on Invidious
that allowed Saromok's souped-up goblins to poison and kill my
character, I've decided to take another look at memory management.

I may be missing some historical context for why things are the way they
currently are in Crossfire, so I welcome feedback on what these proposed
changes might do or not do.

The rationale for each change is included in each patch, but reproduced
here for convenience:


Subject: [PATCH 1/2] Retire MAX_OBJECTS

MAX_OBJECTS has probably outlived its usefulness. It was previously "no
hard limit," only serving to trigger map swapping immediately after
loading a map, if the number of used objects exceeded MAX_OBJECTS.

At worse case, MAX_OBJECTS causes an O(n^2) search and O(n) swaps, where
n is the number of maps in memory. This happens immediately after
loading a very large map. The server takes O(n) to search for the map
with the smallest remaining timeout and swaps it, and does this n times
or until enough memory is freed. If the other maps are small, this does
not free much memory and causes the "performance hit" mentioned in the
comments. This was observed on Invidious, where the server experienced
delays of up to 700 ms immediately after loading a large map due to
excessive swapping.

Removing MAX_OBJECTS does not significantly change the server's
allocation pattern because the server never frees memory (it maintains
an internal free list) and because maps are swapped out based on timeout
at the end of each tick anyway.


Subject: [PATCH 2/2] Retire object allocator

The object allocator allocates OBJ_EXPAND objects at a time and manages
its own free list. It is faster at allocating many objects at once, but
defeats memory debuggers and mitigation techniques. It also does not
return unused memory to the operating system.

Rewrite object_new() to always allocate memory using calloc(). This will
incur some memory overhead since an object struct does not fit perfectly
into a nice allocator slab size.

object_free2() adds freed objects to a free list that is reclaimed at
the end of each server tick. It cannot immediately free object memory,
because some callers expect to be able to query it for FLAG_FREED to
detect object removal.


These changes were tested locally (mostly by running a character around)
with 1) clang -fsanitize=address, which did not report any errors, and
2) with Jemalloc statistics enabled (see attached, jemalloc.out).

As expected, after the change, the allocator bin seeing the most
activity was 768 bytes (struct object is 656 bytes on x86-64).

-- 
Kevin Zheng
kevinz5...@gmail.com | kev...@berkeley.edu
XMPP: kev...@eecs.berkeley.edu
From fe2480277fa6e78db591fac3787a5779cb4a01f4 Mon Sep 17 00:00:00 2001
From: Kevin Zheng 
Date: Mon, 9 Mar 2020 09:54:01 -0700
Subject: [PATCH 1/2] Retire MAX_OBJECTS

MAX_OBJECTS has probably outlived its usefulness. It was previously "no
hard limit," only serving to trigger map swapping immediately after
loading a map, if the number of used objects exceeded MAX_OBJECTS.

At worse case, MAX_OBJECTS causes an O(n^2) search and O(n) swaps, where
n is the number of maps in memory. This happens immediately after
loading a very large map. The server takes O(n) to search for the map
with the smallest remaining timeout and swaps it, and does this n times
or until enough memory is freed. If the other maps are small, this does
not free much memory and causes the "performance hit" mentioned in the
comments. This was observed on Invidious, where the server experienced
delays of up to 700 ms immediately after loading a large map due to
excessive swapping.

Removing MAX_OBJECTS does not significantly change the server's
allocation pattern because the server never frees memory (it maintains
an internal free list) and because maps are swapped out based on timeout
at the end of each tick anyway.
---
 include/config.h | 41 -
 include/sproto.h |  1 -
 server/c_misc.c  |  3 +--
 server/monster.c |  2 +-
 server/server.c  |  1 -
 server/swap.c| 53 
 6 files changed, 2 insertions(+), 99 deletions(-)

diff --git a/include/config.h b/include/config.h
index 4eb2ef92..7752c22a 100644
--- a/include/config.h
+++ b/include/config.h
@@ -336,8 +336,6 @@
  * DMFILE - file with dm/wizard access lists
  * LOGFILE - where to log if using -daemon option
  * MAP_ - various map timeout and swapping parameters
- * MAX_OBJECTS - how many objects to keep in memory.
- * MAX_OBJECTS_LWM - only swap maps out if below that value
  * MOTD - message of the day - printed each time someone joins the game
  * PERM_FILE - limit play times
  * SHUTDOWN - used when shutting down the server
@@ -427,45 +425,6 @@
 /** Default time to reset. */
 #define MAP_DEFAULTRESET   7200
 
-/**
- * MAX_OBJECTS is no hard limit.  If this limit is exceeded, Crossfire
- * will look for maps which are already scheldued for swap