[slurm-dev] Re: xmalloc in Slurm

jette Mon, 07 Apr 2014 14:51:12 -0700


The easy work as describe in Janne's email has been done:
https://github.com/SchedMD/slurm/commit/20004689197075a3dcab8bb946133a64820baa86

A review of all xmalloc references and changing them to xcalloc whereappropriate will take a lot of effort with little benefit likely, soit's probably not going to happen, even though it's arguably the"proper" solution.



Quoting Janne Blomqvist <[email protected]>:

On 2014-03-21 23:19, Daniel M. Weeks wrote:

I started this email after reading the thread started by Hongjia Cao
"performance improvement for hostlist". After a bit of digging and
writing it grew beyond the issue pointed out in that thread so I thought
it best to begin a new one since it may take on a discussion of its own.
It also includes a good bit of ancillary information people may find
generally useful. I don't think the solution committed for the problem
Cao highlighted is correct. My proposed fix is in the second to last
paragraph although it somewhat depends on understanding the other items
that lead up to it.

It already looks long, doesn't it? Let's dive in...

First, the timing test provided by Cao is bogus but perfectly
illustrates the bigger problem Cao is trying to fix. (I'm not saying Cao
is wrong or it was a bad test to run because there's something to be
learned and it led to a more careful analysis.)

Second, a bit of background for those that are unfamiliar with
malloc/memset; skip the paragraph if you know this. malloc takes
practically no time on Linux with glibc because it doesn't really do
much. It's not until the memory is actually accessed that it is truly
allocated by Linux. That's why there is such a discrepancy in the
timing; xmalloc is actually accessing the memory to zero it. This is
also why zeroing a bunch of memory unnecessarily can be a big waste of
time as Cao found. (A more realistic test would involve accessing a
portion of the memory between each xmalloc/malloc and xfree/free.)

It's also worth noting at this time that there is a fundamental
difference between malloc+memset and calloc. For a better explanation
than I can include in this space see Dietrich Epp's writeup on Stack
Overflow[1].

I'll say Cao is mostly right. It is probably better to *not* use xmalloc
since it is potentially allocating more memory than necessary and
zeroing otherwise untouched memory is a waste. However, xmalloc does
more than just zero the memory; by switching to malloc those other
features are getting tossed.

The question becomes whether those features of xmalloc are important
enough to be kept for this case. (I have made some nodes about
additional problems to be corrected.) Let's take a look:

void *slurm_xmalloc
(size_t size, const char *file, int line, const char *func)
{
void *new;
int *p;


xmalloc_assert(size >= 0 && size <= INT_MAX);


Good for debugging. However, size is type size_t. Comparison should be
against SIZE_MAX.

Yes, but if you compare against SIZE_MAX then the assert will neverfire, since size_t is unsigned, so the assert might as well beremoved then.

Perhaps the idea was that if something tries to allocate more thanINT_MAX something has gone terribly wrong as slurm typically doesn'tallocate very large objects, so you might as well abort at thatpoint? Then again, if so, why use INT_MAX and not some other magicnumber? Furthermore, is xmalloc() the proper place for such checks,and even moreso, are such checks actually useful in general?

MALLOC_LOCK();
p = (int *)malloc(size + 2*sizeof(int));


Don't cast malloc. Ever. It hides problems. The 2*sizeof(int) is a
problem too as seen a little later once we start assigning to it.

MALLOC_UNLOCK();


*Great* on platforms with non-thread-safe malloc

MALLOC_LOCK and MALLOC_UNLOCK are actually macros, you can see thedefinitions in the beginning of xmalloc.c. Note also that theHAVE_UNSAFE_MALLOC macro that controls the expansion of those macrosis actually never set, so unless you have modified your source treeto set it, MALLOC_LOCK/UNLOCK are no-ops.

That being said, I don't think there exists any remotely recentsystem which supports threads but doesn't provide a thread-safemalloc, so IMHO those macros should be removed.

and since we care about
an OOM condition and want to handle it once. Since Slurm is
multithreaded we don't want >1 OOM condition simultaneously.

Can this actually happen? Isn't the entire process killedimmediately once the OOM killer decides to kill it?

p[0] = XMALLOC_MAGIC;   /* add "secret" magic cookie */
p[1] = (int)size;       /* store size in buffer */


Neat. The only problem is size_t and int aren't the same thing and this
feature would not work as intended on some architectures.

Yeah, it means that xmalloc doesn't work for allocations larger thanINT_MAX, though the xmalloc_assert previously already guards againstthis. A simple fix would be to make "p" a size_t* instead of int*.

new = &p[2];
memset(new, 0, size);


Whoops. We can also get an OOM condition here and we can't catch it
thanks to Linux over-committing memory by default.


Yes, but there's not much one can do about it here

Now, there is the issue of malloc+memset vs calloc. For instances where
it's *necessary* to zero the memory (namely ease of string manipulation,
struct field initialization, or security) calloc *could* be used simply
because it's more efficient. However, because of what xmalloc is trying
to accomplish handling OOM conditions calloc will eliminate the memset
call that can't be handled. Not only *could* calloc be used for
efficiency, it *should* be used so xmalloc functions as intended (and
more gracefully handles a memory allocation error).

Now, all the way back to why this originally came up: I think a more
appropriate change is to fix xmalloc to use calloc which should on its
own give a speedup to cases like the one Cao was seeing without changing
its semantics.

If necessary, another malloc wrapper *could* be added that just used
malloc, without zeroing the memory. However, the only benefit over the
calloc-based wrapper would be when a system is under sufficient load
that the kernel has not had time to collect some all-zero pages and
there is a slight delay to calloc. I suspect the system load is really
going to be a bigger slowdown than the small delay in calloc and there
is no benefit to a non-zeroing malloc wrapper.

A "proper" fix would IMHO be to (in addition to the int/size_tissues discussed above)

1) Create a calloc wrapper (xcalloc), and update all current usersof xmalloc to use that instead.

2) Remove the memset from xmalloc, so that xmalloc properly is amalloc wrapper.

3) Audit users of xcalloc, change back to use xmalloc where zeroingisn't needed.




--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & BECS
+358503841576 || [email protected]

[slurm-dev] Re: xmalloc in Slurm

Reply via email to