Re: malloc cleanup and small optimization (step 2)
On 30 December 2017 at 06:44, Otto Moerbeekwrote: > On Sat, Dec 30, 2017 at 06:53:44AM +, kshe wrote: > >> Hi, >> >> Looking at this diff and the previous one, I found some more possible >> cleanups for malloc.c (the patch below is to be applied after both of >> them, even if the second one has not been committed yet): >> >> 1. In malloc_bytes(), use ffs(3) instead of manual loops, which on many >> architectures boils down to merely one or two instructions (for example, >> see src/lib/libc/arch/amd64/string/ffs.S). > > I remember doing some measurements using ffs a long time ago, and it > turned out that is was slower in some cases. Likely due to the > function call overhead or maybe a non-optimal implementation. So I'm > only willing to consider this after seeing benchmarks on a handfull of > architectures. There's __builtin_ffs but Clang / GCC might not be smart enough to use it for ffs(3) automatically.
Re: malloc.c: better double free check
> In the end all double frees still will be caught by the actual free > code, just with a delay. The delayed free buffer double free check is > just a way of catching it as soon as possible to make debugging > easier. That's the reason the originla code could just do the check > on the slot being replaced only. > > The only case that could be missed is when the chunk is given out by > malloc in between the original free and the double free. But that > case never be caught in all circumstances since the delay buffer is of > finite size. > > -Otto True, the delay buffer currently only guarantees allocations are kept out of circulation for one cycle since the random choice is between previously freed allocations, never the current one. It matters more with the other change making half of the quarantine into a ring buffer to provide a longer guaranteed delay. I think that makes sense as a trade-off vs. an extra bit of entropy from a 2x larger random array for a given total quarantine size. It also improves the write- after-free detection, especially with a configurable quarantine size, which makes it somewhat like the ASan quarantine but with delayed detection of write-after-free and only indirect read-after-free detection via junk filling (i.e. if something ends up crashing / breaking from reading junk instead of what it expected).
Re: malloc.c: better double free check
On Sat, 2017-09-23 at 09:32 +0200, Otto Moerbeek wrote: > On Fri, Sep 22, 2017 at 04:35:39PM -0400, Daniel Micay wrote: > > > A linear search works well for the current small quarantine (16) but > > won't work > > well if you ever want to have a larger / configurable quarantine > > size. It would > > also be nice to make this fast enough to enable by default. > > > > We (CopperheadOS) use an open addressed hash table for this based on > > the > > existing hash table since we use a larger quarantine with a FIFO > > queue > > alongside the random array and a configuration size. Ideally the > > code would be > > shared with the existing hash table but I didn't want to make it > > into an > > invasive change downstream. > > > > These are the three downstream patches for OpenBSD malloc in our > > copy of Bionic > > (Android libc), so I'd need to port them to the current upstream > > code to apply > > cleanly. They're currently applied after other changes and it's a > > slightly > > older copy of the base code (after multi-pool support, but before > > the canary > > rework since we'll need to adapt that to our needs). Can get the > > general idea > > from the patches even though they're not going to apply cleanly > > though. > > > > [1] quarantine double-free detection via hash table > > Thanks for sharing this, I'll take a look soon. > > Thinking a bit about this: wouldn't a closed hash table be sufficient? > A collision would then either be a double free, otherwise just replace > old with new. You'll get a O(1) lookup and insert and simpler code. I wouldn't really want to have a random chance of missing a double-free even if the chance is small though. It's a copy of the hash table used to track regions with minor tweaks so it'd be possible to make the existing code able to handle both but it would mean turning it into macros. I'm not sure which is the lesser evil for just 2 uses of a fairly simple data structure.
Re: malloc.c: better double free check
A linear search works well for the current small quarantine (16) but won't work well if you ever want to have a larger / configurable quarantine size. It would also be nice to make this fast enough to enable by default. We (CopperheadOS) use an open addressed hash table for this based on the existing hash table since we use a larger quarantine with a FIFO queue alongside the random array and a configuration size. Ideally the code would be shared with the existing hash table but I didn't want to make it into an invasive change downstream. These are the three downstream patches for OpenBSD malloc in our copy of Bionic (Android libc), so I'd need to port them to the current upstream code to apply cleanly. They're currently applied after other changes and it's a slightly older copy of the base code (after multi-pool support, but before the canary rework since we'll need to adapt that to our needs). Can get the general idea from the patches even though they're not going to apply cleanly though. [1] quarantine double-free detection via hash table diff --git a/libc/bionic/omalloc.c b/libc/bionic/omalloc.c index 23af59501..4f4e1290c 100644 --- a/libc/bionic/omalloc.c +++ b/libc/bionic/omalloc.c @@ -157,6 +157,7 @@ struct dir_info { struct region_info free_regions[MALLOC_MAXCACHE]; /* delayed free chunk slots */ void *delayed_chunks[MALLOC_DELAYED_CHUNK_MASK + 1]; + void *delayed_chunks_set[(MALLOC_DELAYED_CHUNK_MASK + 1) * 4]; size_t rbytesused; /* random bytes used */ char *func; /* current function */ int mutex; @@ -292,6 +293,22 @@ hash(void *p) return sum; } +static inline size_t +hash_chunk(void *p) +{ + size_t sum; + uintptr_t u; + + u = (uintptr_t)p >> MALLOC_MINSHIFT; + sum = u; + sum = (sum << 7) - sum + (u >> 16); +#ifdef __LP64__ + sum = (sum << 7) - sum + (u >> 32); + sum = (sum << 7) - sum + (u >> 48); +#endif + return sum; +} + static inline struct dir_info *getpool(void) { @@ -983,6 +1000,56 @@ delete(struct dir_info *d, struct region_info *ri) } } +void delayed_chunks_insert(struct dir_info *d, void *p) +{ + size_t index; + size_t mask = sizeof(d->delayed_chunks_set) / sizeof(void *) - 1; + void *q; + + index = hash_chunk(p) & mask; + q = d->delayed_chunks_set[index]; + while (q != NULL) { + if (p == q) + wrterror(d, "double free", p); + index = (index - 1) & mask; + q = d->delayed_chunks_set[index]; + } + d->delayed_chunks_set[index] = p; +} + +void delayed_chunks_delete(struct dir_info *d, void *p) +{ + size_t mask = sizeof(d->delayed_chunks_set) / sizeof(void *) - 1; + size_t i, j, r; + void *q; + + i = hash_chunk(p) & mask; + q = d->delayed_chunks_set[i]; + while (q != p) { + if (q == NULL) + wrterror(d, "pointer missing from address tracking table", p); + i = (i - 1) & mask; + q = d->delayed_chunks_set[i]; + } + + for (;;) { + d->delayed_chunks_set[i] = NULL; + j = i; + for (;;) { + i = (i - 1) & mask; + if (d->delayed_chunks_set[i] == NULL) + return; + r = hash_chunk(d->delayed_chunks_set[i]) & mask; + if ((i <= r && r < j) || (r < j && j < i) || + (j < i && i <= r)) + continue; + d->delayed_chunks_set[j] = d->delayed_chunks_set[i]; + break; + } + } +} + + /* * Allocate a page of chunks */ @@ -1474,11 +1541,21 @@ ofree(struct dir_info *argpool, void *p) if (!mopts.malloc_freenow) { if (find_chunknum(pool, r, p) == (uint32_t)-1) goto done; + + if (p == NULL) + goto done; + + delayed_chunks_insert(pool, p); + i = getrbyte(pool) & MALLOC_DELAYED_CHUNK_MASK; tmp = p; p = pool->delayed_chunks[i]; - if (tmp == p) - wrterror(pool, "double free", p); + + if (p == NULL) + goto done; + + delayed_chunks_delete(pool, p); + if (mopts.malloc_junk) validate_junk(pool, p); pool->delayed_chunks[i] = tmp; @@ -2264,6 +2341,7 @@ malloc_dump(int fd, struct dir_info *pool) r = find(pool, p); if (r == NULL) wrterror(pool, "bogus pointer in
Re: random malloc junk
On Tue, 2016-09-13 at 13:27 +0200, Otto Moerbeek wrote: > On Thu, Sep 08, 2016 at 06:42:33PM -0400, Daniel Micay wrote: > > > A bit off-topic: 'J' enables junk-on-init which is for debugging, > > but it > > also currently has security improvements for large allocations. > > There's > > only partial junk-on-free by default (half a page), and 'U' disables > > large allocation junk-on-free without 'J'. I think it would make > > sense > > to remove those optimizations since it's fine if the cost scales up > > with > > larger allocations and losing the guarantee of not leaking data via > > uninitialized memory with 'U' is not great. Using 'U' is quite > > expensive > > regardless, and adds some pathological performance cases for small > > size > > allocations which is more important. I ended up removing both of > > those > > optimizations for the CopperheadOS port. > > I would prefer to see a diff with this. For me, that should be easier > to understand than you description. This is the diff from the CopperheadOS port which won't apply directly to malloc.c in OpenBSD, but should explain what I mean since it's just a few lines. Just ignore the part where it removes malloc_junk=2, which is because junk-on-init is split out (so this obsoleted the extra mode). The current OpenBSD code only wipes up to MALLOC_MAXCHUNK with junk @ 1, and it similarly doesn't wipe at all with 'U' (even though junk-on-free also serves the purpose of preventing information leaks, not just mitigating use-after-free). IMO, optimizing large allocation perf like this isn't worthwhile. diff --git a/libc/bionic/omalloc.c b/libc/bionic/omalloc.c index e451d79..9277ee7 100644 --- a/libc/bionic/omalloc.c +++ b/libc/bionic/omalloc.c @@ -504,7 +504,7 @@ map(struct dir_info *d, void *hint, size_t sz, int zero_fill) madvise(p, sz, MADV_NORMAL); if (zero_fill) memset(p, 0, sz); - else if (mopts.malloc_junk == 2 && + else if (mopts.malloc_junk && mopts.malloc_freeunmap) memset(p, SOME_FREEJUNK, sz); return p; @@ -524,7 +524,7 @@ map(struct dir_info *d, void *hint, size_t sz, int zero_fill) d->free_regions_size -= psz; if (zero_fill) memset(p, 0, sz); - else if (mopts.malloc_junk == 2 && mopts.malloc_freeunmap) + else if (mopts.malloc_junk && mopts.malloc_freeunmap) memset(p, SOME_FREEJUNK, sz); return p; } @@ -603,7 +603,7 @@ omalloc_parseopt(char opt) mopts.malloc_junk = 0; break; case 'J': - mopts.malloc_junk = 2; + mopts.malloc_junk = 1; break; case 'i': mopts.malloc_junk_init = 0; @@ -1517,8 +1517,7 @@ ofree(struct dir_info *pool, void *p) STATS_SUB(pool->malloc_guarded, mopts.malloc_guard); } if (mopts.malloc_junk && !mopts.malloc_freeunmap) { - size_t amt = mopts.malloc_junk == 1 ? MALLOC_MAXCHUNK : - PAGEROUND(sz) - mopts.malloc_guard; + size_t amt = PAGEROUND(sz) - mopts.malloc_guard; memset(p, SOME_FREEJUNK, amt); } unmap(pool, p, PAGEROUND(sz));
Re: random malloc junk
A nice security property of 0xdf filling is that a use-after-free of a pointer is guaranteed to fault in a typical environment since it ends up pointing outside userspace (I assume that's the case on OpenBSD). A heap spray could potentially allow exploiting a random pointer. Perhaps it would be better if only the byte range guaranteeing faults for pointers was used? Less random, but strictly better than the current situation rather than losing a nice guarantee.
Re: random malloc junk
> BTW, we should revisit canaries and work further on moving them > closer to requested size. There's a chance this diff wil complicate > that. Can switch the canary code to memcpy/memcmp (to handle unaligned canaries) and could then use the last byte as an index to the start of the canary. For larger movement, it could have a special value (like 255) meaning that there's a 4 byte length at an 8 byte offset. If you really wanted you could micro-optimize to lose only 1 canary bit, but that seems too complicated to save 7 bits of the canary. It could also sanity check the offsets based on the known minimum size of a chunk in that size class.
Re: random malloc junk
On Wed, 2016-09-07 at 18:29 -0400, Ted Unangst wrote: > Instead of always using a fixed byte pattern, I think malloc should > use a > random pattern. Now, this sometimes means it's harder to identify > exactly > what's used after free, so we should provide a means to get the old > 0xdf > pattern back. > > Since we already have two junk modes, I thought I'd carry on along > those > lines. The default junk behavior, for free chunks only, is more of a > security > measure. I think this means we want random junk. The second level 'J' > junk is > more of a debugging tool, so that retains 0xdf. A bit off-topic: 'J' enables junk-on-init which is for debugging, but it also currently has security improvements for large allocations. There's only partial junk-on-free by default (half a page), and 'U' disables large allocation junk-on-free without 'J'. I think it would make sense to remove those optimizations since it's fine if the cost scales up with larger allocations and losing the guarantee of not leaking data via uninitialized memory with 'U' is not great. Using 'U' is quite expensive regardless, and adds some pathological performance cases for small size allocations which is more important. I ended up removing both of those optimizations for the CopperheadOS port.
malloc: add full delayed chunk double-free detection
This uses a hash table to maintain a set of delayed allocations, allowing full and efficient double-free detection. The current code can only catch it when the two pointers being swapped are equal, so double-frees that could be caught are missed. A naive loop over every delayed chunk would work fine with the current 16 slot array, but that would make scaling up the number of delayed allocations much more costly. I'm using this with 2 other changes making the size configurable and adding a FIFO ring buffer in front of the randomized array of the same size, to get a minimum guaranteed delay more than the current 1 free cycle. The randomization still works fine and it ends up providing a nice balance. It's essentially equivalent to the quarantine feature in Valgrind/ASan, but suitable for production and with weaker use-after-free detection. It mixes well with the earlier change to detect write-after-free via junk validation here. diff --git a/stdlib/malloc.c b/stdlib/malloc.c index bc328d2..9e8bd16 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -118,6 +118,7 @@ struct dir_info { struct region_info free_regions[MALLOC_MAXCACHE]; /* delayed free chunk slots */ void *delayed_chunks[MALLOC_DELAYED_CHUNK_MASK + 1]; + void *delayed_chunks_set[(MALLOC_DELAYED_CHUNK_MASK + 1) * 2]; size_t rbytesused; /* random bytes used */ char *func; /* current function */ u_char rbytes[32]; /* random bytes */ @@ -249,6 +250,22 @@ hash(void *p) return sum; } +static inline size_t +hash_chunk(void *p) +{ + size_t sum; + uintptr_t u; + + u = (uintptr_t)p >> MALLOC_MINSHIFT; + sum = u; + sum = (sum << 7) - sum + (u >> 16); +#ifdef __LP64__ + sum = (sum << 7) - sum + (u >> 32); + sum = (sum << 7) - sum + (u >> 48); +#endif + return sum; +} + static void wrterror(struct dir_info *d, char *msg, void *p) { @@ -864,6 +881,58 @@ delete(struct dir_info *d, struct region_info *ri) } } +void delayed_chunks_insert(struct dir_info *d, void *p) { + size_t index; + size_t mask = sizeof(d->delayed_chunks_set) / sizeof(void *) - 1; + void *q; + + index = hash_chunk(p) & mask; + q = d->delayed_chunks_set[index]; + while (q != NULL) { + if (p == q) { + wrterror(d, "double free", p); + return; + } + index = (index - 1) & mask; + q = d->delayed_chunks_set[index]; + } + d->delayed_chunks_set[index] = p; +} + +void delayed_chunks_delete(struct dir_info *d, void *p) { + size_t mask = sizeof(d->delayed_chunks_set) / sizeof(void *) - 1; + size_t i, j, r; + void *q; + + i = hash_chunk(p) & mask; + q = d->delayed_chunks_set[i]; + while (q != p) { + if (q == NULL) { + wrterror(d, "pointer missing from address tracking table", p); + return; + } + i = (i - 1) & mask; + q = d->delayed_chunks_set[i]; + } + + for (;;) { + d->delayed_chunks_set[i] = NULL; + j = i; + for (;;) { + i = (i - 1) & mask; + if (d->delayed_chunks_set[i] == NULL) + return; + r = hash_chunk(d->delayed_chunks_set[i]) & mask; + if ((i <= r && r < j) || (r < j && j < i) || + (j < i && i <= r)) + continue; + d->delayed_chunks_set[j] = d->delayed_chunks_set[i]; + break; + } + } +} + + /* * Allocate a page of chunks */ @@ -1315,13 +1384,21 @@ ofree(struct dir_info *pool, void *p) if (!mopts.malloc_freenow) { if (find_chunknum(pool, r, p) == -1) return; + + if (p == NULL) + return; + + delayed_chunks_insert(pool, p); + i = getrbyte(pool) & MALLOC_DELAYED_CHUNK_MASK; tmp = p; p = pool->delayed_chunks[i]; - if (tmp == p) { - wrterror(pool, "double free", p); + + if (p == NULL) return; - } + + delayed_chunks_delete(pool, p); + if (mopts.malloc_junk) validate_junk(pool, p); pool->delayed_chunks[i] = tmp; @@ -1950,6 +2027,7 @@ malloc_dump(int fd) wrterror(pool, "bogus pointer in malloc_dump", p); continue; } +
Re: malloc: fix setting errno on out-of-memory
Sorry, my mail client was being stupid. Lets try again in good old mutt: diff --git a/stdlib/malloc.c b/stdlib/malloc.c index bc328d2..aa6f0a3 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -1224,8 +1224,9 @@ malloc(size_t size) r = omalloc(d, size, 0, CALLER); d->active--; _MALLOC_UNLOCK(); - if (r == NULL && mopts.malloc_xmalloc) { - wrterror(d, "out of memory", NULL); + if (r == NULL) { + if (mopts.malloc_xmalloc) + wrterror(d, "out of memory", NULL); errno = ENOMEM; } if (r != NULL) @@ -1510,8 +1511,9 @@ realloc(void *ptr, size_t size) d->active--; _MALLOC_UNLOCK(); - if (r == NULL && mopts.malloc_xmalloc) { - wrterror(d, "out of memory", NULL); + if (r == NULL) { + if (mopts.malloc_xmalloc) + wrterror(d, "out of memory", NULL); errno = ENOMEM; } if (r != NULL) @@ -1563,8 +1565,9 @@ calloc(size_t nmemb, size_t size) d->active--; _MALLOC_UNLOCK(); - if (r == NULL && mopts.malloc_xmalloc) { - wrterror(d, "out of memory", NULL); + if (r == NULL) { + if (mopts.malloc_xmalloc) + wrterror(d, "out of memory", NULL); errno = ENOMEM; } if (r != NULL) @@ -1694,10 +1697,9 @@ posix_memalign(void **memptr, size_t alignment, size_t size) d->active--; _MALLOC_UNLOCK(); if (r == NULL) { - if (mopts.malloc_xmalloc) { + if (mopts.malloc_xmalloc) wrterror(d, "out of memory", NULL); - errno = ENOMEM; - } + errno = ENOMEM; goto err; } errno = saved_errno;
malloc: fix setting errno on out-of-memory
The ENOMEM errno wasn't being set in some code paths where it was gated behind mopts.malloc_xmalloc: diff --git a/stdlib/malloc.c b/stdlib/malloc.c index bc328d2..aa6f0a3 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -1224,8 +1224,9 @@ malloc(size_t size) r = omalloc(d, size, 0, CALLER); d->active--; _MALLOC_UNLOCK(); - if (r == NULL && mopts.malloc_xmalloc) { - wrterror(d, "out of memory", NULL); + if (r == NULL) { + if (mopts.malloc_xmalloc) + wrterror(d, "out of memory", NULL); errno = ENOMEM; } if (r != NULL) @@ -1510,8 +1511,9 @@ realloc(void *ptr, size_t size) d->active--; _MALLOC_UNLOCK(); - if (r == NULL && mopts.malloc_xmalloc) { - wrterror(d, "out of memory", NULL); + if (r == NULL) { + if (mopts.malloc_xmalloc) + wrterror(d, "out of memory", NULL); errno = ENOMEM; } if (r != NULL) @@ -1563,8 +1565,9 @@ calloc(size_t nmemb, size_t size) d->active--; _MALLOC_UNLOCK(); - if (r == NULL && mopts.malloc_xmalloc) { - wrterror(d, "out of memory", NULL); + if (r == NULL) { + if (mopts.malloc_xmalloc) + wrterror(d, "out of memory", NULL); errno = ENOMEM; } if (r != NULL) @@ -1694,10 +1697,9 @@ posix_memalign(void **memptr, size_t alignment, size_t size) d->active--; _MALLOC_UNLOCK(); if (r == NULL) { - if (mopts.malloc_xmalloc) { + if (mopts.malloc_xmalloc) wrterror(d, "out of memory", NULL); - errno = ENOMEM; - } + errno = ENOMEM; goto err; } errno = saved_errno; -- 2.8.2
Re: multi-pool malloc wip diff
> - Curently fixed at 4 pools with a fixed thread -> pool mapping. > - All pools are always initialized, even for single threaded programs, > where > only one pool is used. > - Especially realloc gets quite a bit uglier. > - I'm pondering storing the thread -> pool mapping in the thread > struct instead of computing it each time from the tcb address. If you wanted to get rid of the static mapping of threads to pools, it would make sense to replace the per-pool region tables with a global, 2- tier table. The first tier would be a table of mutexes paired with the current style region tables, and it would pick the right one using a secondary hash. It could be much larger than the number of pools to avoid it being a bottleneck. This would eliminate the free / realloc loops. I don't think it would make sense right now, since it would add an extra pair of lock / unlock calls to each malloc / free call in the common case where allocations are freed by the allocating thread. If the thread -> pool assignment was dynamic, it would be faster due to not needing those loops. So if you wanted to do something like picking a random pool each time, this would make a lot of sense. It would be increasingly better with a larger number of pools.
Re: refine canaries
On Wed, 2015-12-09 at 07:47 -0500, Ted Unangst wrote: > This is a kind of two steps forward, one step back diff. > > I would like for the canary to be placed directly adjacent to the end > of the > user specified size. No slack. To accomplish this, we record the > original size > of the allocation at the end, then we can walk backwards to find the > canary. > Mechanically, this requires passing the real size down a little deeper > in some > places. Makes sense. The goal was reliably catching small overflows but the size class rounding proved to be a major problem (especially with the coarse power of two size classes). And it doesn't seem like it would reduce the potential for using this as a security feature. > I also changed the canaries to be entirely random, with the correct > value > stored in each chunk. This is a security tradeoff, since now an > attacker can > overwrite the whole works predictably. But I consider this more of a > debugging > feature than a security feature, so I'd prefer to greatly improve the > ability > to catch one byte overwrites. A fair trade. An alternative to this would be storing a random value in the metadata for the chunk pages. That would avoid needing to spend more memory for the inline random value and wouldn't have a negative security impact. It wasn't really meant for protecting against sequential overflows since it would usually detect them too late, but ideally it could still work as a defence against them in some cases. Not quite as good as a random value per canary, but a random value per page would already be pretty fine-grained and each canary can still be made unique via something like the current `random ^ hash(address)`. There also might be a much better way of generating the canaries from the random value. I just copied the existing pattern and added the hash call to make it a little less bad. If performance wasn't a concern, it could simply use SipHash but... it's way too slow. Hiding the seed from people able to leak the canary value would only require encryption but ideally it would actually be a proper MAC (that just seems unrealistic).
Re: malloc canaries and validation
It would be great to land this simple initial implementation and move from there. I have ideas on how to make these features better but I'm wary of doing a lot of work out-of-tree. If it lands in some form, that would go a long way to encouraging further work on it. I basically just don't want to end up maintaining a whole allocator implementation. For detecting UAF, the ideal would be having a FIFO ring buffer placed in front of the randomized delayed chunk array. The delay would then depend on the amount of memory that's set aside for this. For example, delaying up to 4M of small allocations (which is a lot, but totally doable with modern hardware) would go a long way to catching most writes after free via this junk validation feature. Could potentially be configurable as the cache size is. It also becomes increasingly more useful to check the currently delayed allocations at exit rather than just when they get flushed out of the delayed array (and/or queue) if there's actually a longer form of delay. It isn't all that useful right now, but it would be a valuable component of this if it was fancier. Right now it basically relies on luck to detect anything, unless the allocation is immediately written to after a free and before the next free. It's a place where a mix of determinism and non-determinism would be best, i.e. a guaranteed amount of delay, but without losing the randomization since that is useful for other reasons (randomized heap layout). The malloc canaries mostly just don't play well with the current power of two size classes. They'd be very cheap if the size classes were finer grained, i.e. accepting more external fragmentation but reducing the internal fragmentation (wasted padding). There's a nice technique used in jemalloc based on spacing classes rather than size classes but it isn't appropriate in OpenBSD malloc because it would lose the nice property of stuff being isolated based on size (and it would really be best to *expand* that property more towards what PartitionAlloc does rather than losing it).
Re: malloc canaries and validation
On Sun, 2015-12-06 at 17:10 -0700, Theo de Raadt wrote: > Kept out of circulation? It sounds like it would be incredibly > expensive data-structure wise for the kernel to even attempt such a > gaurantee.. I was just expecting it to be a FIFO ring buffer. It would have a limit on the number of mappings it can delay along with a maximum upper bound on the total mapping size (and it would flush the oldest mappings out to stay under that threshold). It would increase virtual memory fragmentation so it could be expensive. I was thinking about it from a userspace perspective which is why I had mentioned purging the delayed mappings with MADV_FREE. It wouldn't make sense to do it in userspace though. Simply unmapping right away and approaching it in the kernel is better. > So what is the purpose of your proposal? Is it to opportunistically > find bugs? Or to resist attack in UAF situations by "gauranteeing" > the memory is unallocated at time of reuse? Guaranteeing that the memory is not reused until a given amount of allocator throughput happens. And reliable immediate detection of the access since it will be guaranteed to fault. > Maybe you are on a system with where mmap ASLR isn't as good? The > OpenBSD one isn't perfect, because we are trying to maintain a small > amount of page-table locality (so that N-level page tables don't have > to be walked fully from root for every access that misses in the > TLB). Yeah, the Linux kernel's mmap ASLR only randomizes the base. So I'm not getting the same security properties as OpenBSD malloc has on OpenBSD. I could implement fine-grained randomization in the kernel for 64-bit (Android has no virtual memory to spare on 32-bit) but I thought a delay might be useful even with that in place as it fully prevents reuse for a given period of "time" (allocator throughput). It depends on how aggressive the mmap randomization is and perhaps on the level of control that an attacker can exert over memory allocation.
Re: malloc canaries and validation
It would also be interesting to try out a more aggressive form of freeunmap for 64-bit where the allocations are purged with MADV_FREE and then the virtual memory is kept out of circulation with a similar FIFO queue approach. Could potentially do it by default when malloc hints are enabled, so it wouldn't need a new option exposed (but it would change the MADV_FREE option into something that enhances security at the expense of more VM fragmentation rather than a performance vs. memory trade-off so that may not make much sense after all). It's the same issue as the junk validation feature where there's a need for a reliable delay to get the most out of the feature. Randomization does help, but it's not as good as knowing that virtual memory doesn't go back into circulation until some configured amount of allocator throughput has occurred.
Re: enhanced use-after-free detection for malloc v2
On 02/11/15 06:40 AM, Theo Buehler wrote: > Sorry for this rather long mail: > > I have three small comments on the patch itself > (starting 80 lines below). > > For those who want to try both new features, I attached a patch against > -current that merges the three parts of Daniel's diff (plus the trivial > two of the nits below) at the very end of this mail. > > In addition to Daniel's test program, here's a small program that > in 1000 test runs on amd64 always triggered an abort at the second > free(). > > > $ cat heap.c > #include > #include > #include > #include > > int > main(void) > { > char *p; > > p = malloc(1); > if (p == NULL) > err(1, "malloc"); > > p = strcpy(p, "1234567"); > printf("%s\n", p); > free(p); > > p = malloc(1); > p = strcpy(p, "12345678"); > printf("%s\n", p); > free(p); > > return 0; > } > $ cc -o heap heap.c > /tmp//ccHH1E1h.o: In function `main': heap.c:(.text+0x6b): warning: warning: > strcpy() is almost always > misused, please use strlcpy() > $ ./heap > 1234567 > 12345678 > heap(5992) in free(): error: chunk canary corrupted 0x1f4619f0e9c0 > Abort trap (core dumped) > $ > > > The performance impact of MALLOC_OPTIONS='CSV' is very noticeable, > but 'CV' is negligible and 'CJV' makes the system feel a bit less > snappy on my machine, but the actual effect seems almost negligible. > > Here's a rough 'benchmark': > > Building the base system on my i7 Thinkpad T420 with 4G RAM, > with the various malloc options each with a freshly booted system. > > defaults: 31m04 > CV: 33m16 > CJV: 33m46 > CSV: 53m32 > > Unfortunately, I have yet to hit an actual bug with this, one that isn't > triggered by simple test programs. Casual browsing with firefox, even > watching movies with mplayer is possible with MALLOC_OPTIONS='CSV'. It would help quite a bit to bump up the size of the quarantine. The default 16 large one doesn't keep around the allocations for very long. A more expensive version of the feature could also check for either zero or junk data before allocating anything. Haven't decided how to expose that via the option though. Most bugs triggering in fairly normal situations with Firefox, etc. have probably already been found by Valgrind though. The strength of features like this in malloc is that they can enabled all the time with little performance loss, so eventually they'll find new bugs (which has been the case for us on Android, but perhaps that's mostly because many apps were never tested at all with ASAN/Valgrind). It may also be viable as a security feature in some situations, not just a way of finding bugs. Depends on how far away the UAF is from the free call since one other free is all that's needed to lose reliable detection. It might work better with a FIFO ring buffer rather than the current fully randomized array (perhaps a mix? dunno). > On Sun, Nov 01, 2015 at 02:56:11PM -0500, Daniel Micay wrote: >> @@ -560,6 +563,12 @@ omalloc_init(struct dir_info **dp) >> case 'J': >> mopts.malloc_junk = 2; >> break; >> +case 'v': >> +mopts.malloc_validate = 0; >> +break; >> +case 'V': >> +mopts.malloc_validate = 1; >> +break; >> case 'n': >> case 'N': >> break; > > I'd keep the alphabetical order in the switch. > >> @@ -608,6 +617,9 @@ omalloc_init(struct dir_info **dp) >> } >> } >> >> +if (!mopts.malloc_junk) >> +mopts.malloc_validate = 0; >> + > > Wouldn't it make sense to just let 'V' imply 'J' (and 'j' imply 'v') as > is already done with 'F' and 'U', respectively? The V option doesn't benefit from J since it's only able to check for use-after-frees in small allocations. I don't think there's a good reason to tie them together. A side note is that it would be nice if junk freeing was split up into the security feature (full junk fill on free) and the debugging feature (full junk fill on init). I ended up doing that in CopperheadOS because it makes J significantly cheaper without giving up the nice security properties gained by sanitizing everything on free. Might be too late to change the meaning in OpenBSD though. >> @@ -1190,6 +1208,35 @@ malloc(size_t size) >> /*DEF_STRONG(malloc);*/ >
Re: enhanced use-after-free detection for malloc v2
(without mangling it this time...) diff --git a/stdlib/malloc.c b/stdlib/malloc.c index 424dd77..c408594 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -182,6 +182,7 @@ struct malloc_readonly { int malloc_freeunmap; /* mprotect free pages PROT_NONE? */ int malloc_hint;/* call madvice on free pages? */ int malloc_junk;/* junk fill? */ + int malloc_validate;/* validate junk */ int malloc_move;/* move allocations to end of page? */ int malloc_realloc; /* always realloc? */ int malloc_xmalloc; /* xmalloc behaviour? */ @@ -218,6 +219,8 @@ static void malloc_exit(void); #define CALLER NULL #endif +static void validate_delayed_chunks(void); + /* low bits of r->p determine size: 0 means >= page size and p->size holding * real size, otherwise r->size is a shift count, or 1 for malloc(0) */ @@ -560,6 +563,12 @@ omalloc_init(struct dir_info **dp) case 'J': mopts.malloc_junk = 2; break; + case 'v': + mopts.malloc_validate = 0; + break; + case 'V': + mopts.malloc_validate = 1; + break; case 'n': case 'N': break; @@ -608,6 +617,9 @@ omalloc_init(struct dir_info **dp) } } + if (!mopts.malloc_junk) + mopts.malloc_validate = 0; + #ifdef MALLOC_STATS if (mopts.malloc_stats && (atexit(malloc_exit) == -1)) { static const char q[] = "malloc() warning: atexit(2) failed." @@ -616,6 +628,12 @@ omalloc_init(struct dir_info **dp) } #endif /* MALLOC_STATS */ + if (mopts.malloc_validate && (atexit(validate_delayed_chunks) == -1)) { + static const char q[] = "malloc() warning: atexit(2) failed." + " Will not be able to check for use after free\n"; + write(STDERR_FILENO, q, sizeof(q) - 1); + } + while ((mopts.malloc_canary = arc4random()) == 0) ; @@ -1190,6 +1208,35 @@ malloc(size_t size) /*DEF_STRONG(malloc);*/ static void +validate_junk(void *p) { + struct region_info *r; + struct dir_info *pool = getpool(); + size_t byte, sz; + if (p == NULL) + return; + r = find(pool, p); + if (r == NULL) + wrterror("bogus pointer in validate_junk", p); + REALSIZE(sz, r); + for (byte = 0; byte < sz; byte++) { + if (((char *)p)[byte] != SOME_FREEJUNK) { + wrterror("use after free", p); + return; + } + } +} + +static void +validate_delayed_chunks(void) { + struct dir_info *pool = getpool(); + int i; + if (pool == NULL) + return; + for (i = 0; i < MALLOC_DELAYED_CHUNK_MASK + 1; i++) + validate_junk(pool->delayed_chunks[i]); +} + +static void ofree(void *p) { struct dir_info *pool = getpool(); @@ -1253,6 +1300,8 @@ ofree(void *p) wrterror("double free", p); return; } + if (mopts.malloc_validate) + validate_junk(p); pool->delayed_chunks[i] = tmp; } if (p != NULL) {
Re: enhanced use-after-free detection for malloc v2
On 26/10/15 04:19 PM, Daniel Micay wrote: > This is an improved revision of my earlier patch. > > It now validates the junk data in the delayed_chunks array in an atexit > handler > too, rather than just when allocations are swapped out. > > It will now catch this simple UAF 100% of the time: > > #include > #include > > int main(void) { > size_t i; > char *p; > for (i = 0; i < 32; i++) { > p = malloc(16); > if (!p) return 1; > } > > p = malloc(16); > if (!p) return 1; > free(p); > *p = 5; > > for (i = 0; i < 4; i++) { > p = malloc(16); > if (!p) return 1; > free(p); > } > return 0; > } > > In general, it depends on the allocation still being in the delayed chunks > array when the use-after-free happens. This means a larger delayed chunks > array would improve the detection rate. That revision was missing a NULL check as it got lost when refactoring and the test cases weren't short enough to trigger it. Properly working implementation: diff --git a/stdlib/malloc.c b/stdlib/malloc.c index 424dd77..c408594 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -182,6 +182,7 @@ struct malloc_readonly { int malloc_freeunmap; /* mprotect free pages PROT_NONE? */ int malloc_hint;/* call madvice on free pages? */ int malloc_junk;/* junk fill? */ + int malloc_validate;/* validate junk */ int malloc_move;/* move allocations to end of page? */ int malloc_realloc; /* always realloc? */ int malloc_xmalloc; /* xmalloc behaviour? */ @@ -218,6 +219,8 @@ static void malloc_exit(void); #define CALLER NULL #endif +static void validate_delayed_chunks(void); + /* low bits of r->p determine size: 0 means >= page size and p->size holding * real size, otherwise r->size is a shift count, or 1 for malloc(0) */ @@ -560,6 +563,12 @@ omalloc_init(struct dir_info **dp) case 'J': mopts.malloc_junk = 2; break; + case 'v': + mopts.malloc_validate = 0; + break; + case 'V': + mopts.malloc_validate = 1; + break; case 'n': case 'N': break; @@ -608,6 +617,9 @@ omalloc_init(struct dir_info **dp) } } + if (!mopts.malloc_junk) + mopts.malloc_validate = 0; + #ifdef MALLOC_STATS if (mopts.malloc_stats && (atexit(malloc_exit) == -1)) { static const char q[] = "malloc() warning: atexit(2) failed." @@ -616,6 +628,12 @@ omalloc_init(struct dir_info **dp) } #endif /* MALLOC_STATS */ + if (mopts.malloc_validate && (atexit(validate_delayed_chunks) == -1)) { + static const char q[] = "malloc() warning: atexit(2) failed." + " Will not be able to check for use after free\n"; + write(STDERR_FILENO, q, sizeof(q) - 1); + } + while ((mopts.malloc_canary = arc4random()) == 0) ; @@ -1190,6 +1208,35 @@ malloc(size_t size) /*DEF_STRONG(malloc);*/ static void +validate_junk(void *p) { + struct region_info *r; + struct dir_info *pool = getpool(); + size_t byte, sz; + if (p == NULL) + return; + r = find(pool, p); + if (r == NULL) + wrterror("bogus pointer in validate_junk", p); + REALSIZE(sz, r); + for (byte = 0; byte < sz; byte++) { + if (((char *)p)[byte] != SOME_FREEJUNK) { + wrterror("use after free", p); + return; + } + } +} + +static void +validate_delayed_chunks(void) { + struct dir_info *pool = getpool(); + int i; + if (pool == NULL) + return; + for (i = 0; i < MALLOC_DELAYED_CHUNK_MASK + 1; i++) + validate_junk(pool->delayed_chunks[i]); +} + +static void ofree(void *p) { struct dir_info *pool = getpool(); @@ -1253,6 +1300,8 @@ ofree(void *p) wrterror("double free", p); return; } + if (mopts.malloc_validate) + validate_junk(p); pool->delayed_chunks[i] = tmp; } if (p != NULL) {
enhanced use-after-free detection for malloc v2
This is an improved revision of my earlier patch. It now validates the junk data in the delayed_chunks array in an atexit handler too, rather than just when allocations are swapped out. It will now catch this simple UAF 100% of the time: #include #include int main(void) { size_t i; char *p; for (i = 0; i < 32; i++) { p = malloc(16); if (!p) return 1; } p = malloc(16); if (!p) return 1; free(p); *p = 5; for (i = 0; i < 4; i++) { p = malloc(16); if (!p) return 1; free(p); } return 0; } In general, it depends on the allocation still being in the delayed chunks array when the use-after-free happens. This means a larger delayed chunks array would improve the detection rate. diff --git a/stdlib/malloc.c b/stdlib/malloc.c index 424dd77..4a635b6 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -182,6 +182,7 @@ struct malloc_readonly { int malloc_freeunmap; /* mprotect free pages PROT_NONE? */ int malloc_hint;/* call madvice on free pages? */ int malloc_junk;/* junk fill? */ + int malloc_validate;/* validate junk */ int malloc_move;/* move allocations to end of page? */ int malloc_realloc; /* always realloc? */ int malloc_xmalloc; /* xmalloc behaviour? */ @@ -218,6 +219,8 @@ static void malloc_exit(void); #define CALLER NULL #endif +static void validate_delayed_chunks(void); + /* low bits of r->p determine size: 0 means >= page size and p->size holding * real size, otherwise r->size is a shift count, or 1 for malloc(0) */ @@ -560,6 +563,12 @@ omalloc_init(struct dir_info **dp) case 'J': mopts.malloc_junk = 2; break; + case 'v': + mopts.malloc_validate = 0; + break; + case 'V': + mopts.malloc_validate = 1; + break; case 'n': case 'N': break; @@ -608,6 +617,9 @@ omalloc_init(struct dir_info **dp) } } + if (!mopts.malloc_junk) + mopts.malloc_validate = 0; + #ifdef MALLOC_STATS if (mopts.malloc_stats && (atexit(malloc_exit) == -1)) { static const char q[] = "malloc() warning: atexit(2) failed." @@ -616,6 +628,12 @@ omalloc_init(struct dir_info **dp) } #endif /* MALLOC_STATS */ + if (mopts.malloc_validate && (atexit(validate_delayed_chunks) == -1)) { + static const char q[] = "malloc() warning: atexit(2) failed." + " Will not be able to check for use after free\n"; + write(STDERR_FILENO, q, sizeof(q) - 1); + } + while ((mopts.malloc_canary = arc4random()) == 0) ; @@ -1190,6 +1208,33 @@ malloc(size_t size) /*DEF_STRONG(malloc);*/ static void +validate_junk(void *p) { + struct region_info *r; + struct dir_info *pool = getpool(); + size_t byte, sz; + r = find(pool, p); + if (r == NULL) + wrterror("bogus pointer in validate_junk", p); + REALSIZE(sz, r); + for (byte = 0; byte < sz; byte++) { + if (((char *)p)[byte] != SOME_FREEJUNK) { + wrterror("use after free", p); + return; + } + } +} + +static void +validate_delayed_chunks(void) { + struct dir_info *pool = getpool(); + int i; + if (pool == NULL) + return; + for (i = 0; i < MALLOC_DELAYED_CHUNK_MASK + 1; i++) + validate_junk(pool->delayed_chunks[i]); +} + +static void ofree(void *p) { struct dir_info *pool = getpool(); @@ -1253,6 +1298,8 @@ ofree(void *p) wrterror("double free", p); return; } + if (mopts.malloc_validate) + validate_junk(p); pool->delayed_chunks[i] = tmp; } if (p != NULL) {
Re: support for malloc allocation canaries
On 25/10/15 06:20 PM, Ted Unangst wrote: > Daniel Micay wrote: >> This patch adds an opt-in malloc configuration option placing canaries after >> small allocations to detect heap overflows on free(...). It's intended to be >> used alongside guard pages for large allocations. Since it's essentially >> adding extra padding to all small allocations, a small heap overflow will be >> rendered harmless. > > This is all very cool. I'd like to look more at it sometime soon. Probably worth looking at the use-after-free detection patch first since it's simpler. And it's worth noting that the 2 features conflict with each other, a small change is required to make them compatible: diff --git a/libc/bionic/omalloc.c b/libc/bionic/omalloc.c index 91ae559..dd39117 100644 --- a/libc/bionic/omalloc.c +++ b/libc/bionic/omalloc.c @@ -1371,6 +1371,8 @@ ofree(void *p) size_t byte; r = find(pool, p); REALSIZE(sz, r); + if (sz > 0 && sz <= MALLOC_MAXCHUNK) + sz -= mopts.malloc_canaries; for (byte = 0; byte < sz; byte++) { if (((char *)p)[byte] != SOME_FREEJUNK) { wrterror("use after free", p);
Re: enhanced use-after-free detection for malloc
Er, here it is without the screwed up whitespace (whoops): diff --git a/stdlib/malloc.c b/stdlib/malloc.c index 424dd77..7c33a7a 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -182,6 +182,7 @@ struct malloc_readonly { int malloc_freeunmap; /* mprotect free pages PROT_NONE? */ int malloc_hint;/* call madvice on free pages? */ int malloc_junk;/* junk fill? */ + int malloc_validate;/* validate junk */ int malloc_move;/* move allocations to end of page? */ int malloc_realloc; /* always realloc? */ int malloc_xmalloc; /* xmalloc behaviour? */ @@ -560,6 +561,12 @@ omalloc_init(struct dir_info **dp) case 'J': mopts.malloc_junk = 2; break; + case 'v': + mopts.malloc_validate = 0; + break; + case 'V': + mopts.malloc_validate = 1; + break; case 'n': case 'N': break; @@ -1253,6 +1260,17 @@ ofree(void *p) wrterror("double free", p); return; } + if (mopts.malloc_junk && mopts.malloc_validate && p != NULL) { + size_t byte; + r = find(pool, p); + REALSIZE(sz, r); + for (byte = 0; byte < sz; byte++) { + if (((char *)p)[byte] != SOME_FREEJUNK) { + wrterror("use after free", p); + return; + } + } + } pool->delayed_chunks[i] = tmp; } if (p != NULL) { -- 2.6.2
Re: support for malloc allocation canaries
On 23/10/15 07:22 AM, Janne Johansson wrote: > With this patch on -current and > ln -sf CJ /etc/malloc.conf > a lot of things stopped working, like "man ls" and "tmux". > With only C or only J, the system seems to work ok, but with both it > doesn't. > Will check the resulting core files. > ntpd and tmux both bombed out on memset inside realloc if I read the > dumps ok: Thanks for testing it. I oversimplified handling the canary padding in the orealloc malloc_junk == 2 branch while trying to clean up the code. This should work properly: diff --git a/stdlib/malloc.c b/stdlib/malloc.c index 424dd77..567e56d 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -185,12 +185,14 @@ struct malloc_readonly { int malloc_move;/* move allocations to end of page? */ int malloc_realloc; /* always realloc? */ int malloc_xmalloc; /* xmalloc behaviour? */ + size_t malloc_canaries;/* use canaries after chunks? */ size_t malloc_guard; /* use guard pages after allocations? */ u_int malloc_cache; /* free pages we cache */ #ifdef MALLOC_STATS int malloc_stats; /* dump statistics at end */ #endif u_int32_t malloc_canary;/* Matched against ones in malloc_pool */ + uintptr_t malloc_chunk_canary; }; /* This object is mapped PROT_READ after initialisation to prevent tampering */ @@ -526,6 +528,12 @@ omalloc_init(struct dir_info **dp) case 'A': mopts.malloc_abort = 1; break; + case 'c': + mopts.malloc_canaries = 0; + break; + case 'C': + mopts.malloc_canaries = sizeof(void *); + break; #ifdef MALLOC_STATS case 'd': mopts.malloc_stats = 0; @@ -619,6 +627,9 @@ omalloc_init(struct dir_info **dp) while ((mopts.malloc_canary = arc4random()) == 0) ; + arc4random_buf(_chunk_canary, + sizeof(mopts.malloc_chunk_canary)); + /* * Allocate dir_info with a guard page on either side. Also * randomise offset inside the page at which the dir_info @@ -984,8 +995,15 @@ malloc_bytes(struct dir_info *d, size_t size, void *f) k += (lp - bp->bits) * MALLOC_BITS; k <<= bp->shift; + if (mopts.malloc_canaries && bp->size > 0) { + char *end = (char *)bp->page + k + bp->size; + uintptr_t *canary = (uintptr_t *)(end - mopts.malloc_canaries); + *canary = mopts.malloc_chunk_canary ^ hash(canary); + } + if (mopts.malloc_junk == 2 && bp->size > 0) - memset((char *)bp->page + k, SOME_JUNK, bp->size); + memset((char *)bp->page + k, SOME_JUNK, + bp->size - mopts.malloc_canaries); return ((char *)bp->page + k); } @@ -999,6 +1017,13 @@ find_chunknum(struct dir_info *d, struct region_info *r, void *ptr) if (info->canary != d->canary1) wrterror("chunk info corrupted", NULL); + if (mopts.malloc_canaries && info->size > 0) { + char *end = (char *)ptr + info->size; + uintptr_t *canary = (uintptr_t *)(end - mopts.malloc_canaries); + if (*canary != (mopts.malloc_chunk_canary ^ hash(canary))) + wrterror("chunk canary corrupted", ptr); + } + /* Find the chunk number on the page */ chunknum = ((uintptr_t)ptr & MALLOC_PAGEMASK) >> info->shift; @@ -1121,7 +1146,7 @@ omalloc(size_t sz, int zero_fill, void *f) /* takes care of SOME_JUNK */ p = malloc_bytes(pool, sz, f); if (zero_fill && p != NULL && sz > 0) - memset(p, 0, sz); + memset(p, 0, sz - mopts.malloc_canaries); } return p; @@ -1176,6 +1201,8 @@ malloc(size_t size) malloc_recurse(); return NULL; } + if (size > 0 && size <= MALLOC_MAXCHUNK) + size += mopts.malloc_canaries; r = omalloc(size, 0, CALLER); malloc_active--; _MALLOC_UNLOCK(); @@ -1242,7 +1269,7 @@ ofree(void *p) int i; if (mopts.malloc_junk && sz > 0) - memset(p, SOME_FREEJUNK, sz); + memset(p, SOME_FREEJUNK, sz - mopts.malloc_canaries); if (!mopts.malloc_freenow) { if (find_chunknum(pool, r, p) == -1) return; @@ -1386,16 +1413,25 @@ gotit: } } if (newsz <= oldsz && newsz > oldsz / 2 && !mopts.malloc_realloc) { - if (mopts.malloc_junk == 2 && newsz > 0) - memset((char
enhanced use-after-free detection for malloc
This patch adds a form of use-after-free detection based on validating that the junk data is still in place when swapping out an allocation from the delayed chunk cache. It will probably nearly double the cost of the junk free feature that's enabled by default since it needs to do a whole extra pass over the data, thus splitting it out into a separate option instead of always doing it. It can catch UAF issues when Freeguard / Free unmap will not since those can only kick in when a whole page is cleared out. This could be extended to also check that the data is *either* zeroed or junk when handing out a chunk as a new allocation. That would add an additional pass over the data, so perhaps this should be given 3 levels like the junk free feature (like this patch by default, disabled with v, also validating the data when doing allocation with V). diff --git a/stdlib/malloc.c b/stdlib/malloc.c index 424dd77..7c33a7a 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -182,6 +182,7 @@ struct malloc_readonly { int malloc_freeunmap; /* mprotect free pages PROT_NONE? */ int malloc_hint;/* call madvice on free pages? */ int malloc_junk;/* junk fill? */ + int malloc_validate;/* validate junk */ int malloc_move;/* move allocations to end of page? */ int malloc_realloc; /* always realloc? */ int malloc_xmalloc; /* xmalloc behaviour? */ @@ -560,6 +561,12 @@ omalloc_init(struct dir_info **dp) case 'J': mopts.malloc_junk = 2; break; + case 'v': + mopts.malloc_validate = 0; + break; + case 'V': + mopts.malloc_validate = 1; + break; case 'n': case 'N': break; @@ -1253,6 +1260,17 @@ ofree(void *p) wrterror("double free", p); return; } + if (mopts.malloc_junk && mopts.malloc_validate && p != NULL) { + size_t byte; + r = find(pool, p); + REALSIZE(sz, r); + for (byte = 0; byte < sz; byte++) { + if (((char *)p)[byte] != SOME_FREEJUNK) { + wrterror("use after free", p); + return; + } + } + } pool->delayed_chunks[i] = tmp; } if (p != NULL) {
support for malloc allocation canaries
Hi, This patch adds an opt-in malloc configuration option placing canaries after small allocations to detect heap overflows on free(...). It's intended to be used alongside guard pages for large allocations. Since it's essentially adding extra padding to all small allocations, a small heap overflow will be rendered harmless. The current implementation uses pointer-size canaries, but it could be easily extended to allow bumping up the size of the canaries by passing the option multiple times. The entry points into malloc account for the canary size when it's enabled and then it's generated on allocation and checked on free. Small allocations without room for a canary are simply turned into large allocations. Some care needs to be taken to avoid clobbering the canary in the junk filling code and realloc copying. The canary is placed at the very end of the memory allocations so there will often be slack space in between the real allocation and the canary preventing small overflows from being detected. It would be much better at detecting corruption with finer-grained size classes. The extreme would be every multiple of the alignment, but logarithmic growth would be more realistic (see jemalloc's size classes). Finer-grained size classes would also reduce the memory overhead caused by allocations being pushed into the next size class by the canary. The canaries are currently generated with canary_value ^ hash(canary_address). It would be best to avoid involving addresses to avoid introducing address leaks via read overflows where there were none before, but it's the easiest way to get unique canaries and is a minor issue to improve down the road. I implemented this feature after porting OpenBSD malloc to Android (in CopperheadOS) and it has found a few bugs in the app ecosystem. Note that I've only heavily tested it there, not on OpenBSD itself. I'm not sure if you want this feature but it seemed worth submitting. Hopefully you don't mind a patch generated with Git. :) diff --git a/stdlib/malloc.c b/stdlib/malloc.c index 424dd77..65b5027 100644 --- a/stdlib/malloc.c +++ b/stdlib/malloc.c @@ -185,12 +185,14 @@ struct malloc_readonly { int malloc_move;/* move allocations to end of page? */ int malloc_realloc; /* always realloc? */ int malloc_xmalloc; /* xmalloc behaviour? */ + size_t malloc_canaries;/* use canaries after chunks? */ size_t malloc_guard; /* use guard pages after allocations? */ u_int malloc_cache; /* free pages we cache */ #ifdef MALLOC_STATS int malloc_stats; /* dump statistics at end */ #endif u_int32_t malloc_canary;/* Matched against ones in malloc_pool */ + uintptr_t malloc_chunk_canary; }; /* This object is mapped PROT_READ after initialisation to prevent tampering */ @@ -526,6 +528,12 @@ omalloc_init(struct dir_info **dp) case 'A': mopts.malloc_abort = 1; break; + case 'c': + mopts.malloc_canaries = 0; + break; + case 'C': + mopts.malloc_canaries = sizeof(void *); + break; #ifdef MALLOC_STATS case 'd': mopts.malloc_stats = 0; @@ -619,6 +627,9 @@ omalloc_init(struct dir_info **dp) while ((mopts.malloc_canary = arc4random()) == 0) ; + arc4random_buf(_chunk_canary, + sizeof(mopts.malloc_chunk_canary)); + /* * Allocate dir_info with a guard page on either side. Also * randomise offset inside the page at which the dir_info @@ -984,8 +995,15 @@ malloc_bytes(struct dir_info *d, size_t size, void *f) k += (lp - bp->bits) * MALLOC_BITS; k <<= bp->shift; + if (mopts.malloc_canaries && bp->size > 0) { + char *end = (char *)bp->page + k + bp->size; + uintptr_t *canary = (uintptr_t *)(end - mopts.malloc_canaries); + *canary = mopts.malloc_chunk_canary ^ hash(canary); + } + if (mopts.malloc_junk == 2 && bp->size > 0) - memset((char *)bp->page + k, SOME_JUNK, bp->size); + memset((char *)bp->page + k, SOME_JUNK, + bp->size - mopts.malloc_canaries); return ((char *)bp->page + k); } @@ -999,6 +1017,13 @@ find_chunknum(struct dir_info *d, struct region_info *r, void *ptr) if (info->canary != d->canary1) wrterror("chunk info corrupted", NULL); + if (mopts.malloc_canaries && info->size > 0) { + char *end = (char *)ptr + info->size; + uintptr_t *canary = (uintptr_t *)(end - mopts.malloc_canaries); + if (*canary != (mopts.malloc_chunk_canary ^ hash(canary))) +