Re: malloc.c: better double free check

2017-09-27 Thread Theo de Raadt
> On Sun, Sep 24, 2017 at 09:02:58PM -0400, Daniel Micay wrote:
> 
> > > In the end all double frees still will be caught by the actual free
> > > code, just with a delay. The delayed free buffer double free check is
> > > just a way of catching it as soon as possible to make debugging
> > > easier.  That's the reason the originla code could just do the check
> > > on the slot being replaced only.
> > > 
> > > The only case that could be missed is when the chunk is given out by
> > > malloc in between the original free and the double free. But that 
> > > case never be caught in all circumstances since the delay buffer is of
> > > finite size.
> > > 
> > >   -Otto
> > 
> > True, the delay buffer currently only guarantees allocations are kept
> > out of circulation for one cycle since the random choice is between
> > previously freed allocations, never the current one.
> > 
> > It matters more with the other change making half of the quarantine into
> > a ring buffer to provide a longer guaranteed delay. I think that makes
> > sense as a trade-off vs. an extra bit of entropy from a 2x larger random
> > array for a given total quarantine size. It also improves the write-
> > after-free detection, especially with a configurable quarantine size,
> > which makes it somewhat like the ASan quarantine but with delayed
> > detection of write-after-free and only indirect read-after-free
> > detection via junk filling (i.e. if something ends up crashing /
> > breaking from reading junk instead of what it expected).
> 
> I'm hesitatant to add too much complexity here. Delayed free is to add
> extra randomization to the re-use of chunks. Detecting double free is
> just an extra. If it can be done cheap, good. But otherwise...
> 
> BTW, you hash function causes too much collisions of chunks, since the n
> lower bits of them are zero. Probably best to shift them right by
> info->shift bits instead of MALLOC_MINSHIFT.

I'll add my two bits.  In the last decade otto's malloc has collected
a vast number of 'cheap' checks.  The cheap ones are the most important
ones, because they can be enabled by default.  As such, they expose bugs
in software which can be resolved.  The mere indication that a piece
of software has one corruption bug detected by malloc, tends to indicate
it has more, so this can drive auditing focus.

However non-cheap checks worry me.  They need to be enabled using
feature flags.  The obvious one is 'S', but now these non-cheap checks
impact the people trying to run with 'S' in support of the community;
their software gets measureably slower, so some people may stop using 'S'.

I think anything beyond O(very small) doesn't have a place here, but
we need to hope that other tooling / review / checks should find those.
Other malloc's entirely lack these checks, they are relying on other
those discovery processes.

That doesn't mean our users need to carry the burden fully.

And please see:  https://arxiv.org/pdf/1709.02746.pdf

(I don't like their pre-reserve design, and hope there is opportunity
to improve the O(not tiny) parts in otto malloc)




Re: malloc.c: better double free check

2017-09-25 Thread Otto Moerbeek
On Sun, Sep 24, 2017 at 09:02:58PM -0400, Daniel Micay wrote:

> > In the end all double frees still will be caught by the actual free
> > code, just with a delay. The delayed free buffer double free check is
> > just a way of catching it as soon as possible to make debugging
> > easier.  That's the reason the originla code could just do the check
> > on the slot being replaced only.
> > 
> > The only case that could be missed is when the chunk is given out by
> > malloc in between the original free and the double free. But that 
> > case never be caught in all circumstances since the delay buffer is of
> > finite size.
> > 
> > -Otto
> 
> True, the delay buffer currently only guarantees allocations are kept
> out of circulation for one cycle since the random choice is between
> previously freed allocations, never the current one.
> 
> It matters more with the other change making half of the quarantine into
> a ring buffer to provide a longer guaranteed delay. I think that makes
> sense as a trade-off vs. an extra bit of entropy from a 2x larger random
> array for a given total quarantine size. It also improves the write-
> after-free detection, especially with a configurable quarantine size,
> which makes it somewhat like the ASan quarantine but with delayed
> detection of write-after-free and only indirect read-after-free
> detection via junk filling (i.e. if something ends up crashing /
> breaking from reading junk instead of what it expected).

I'm hesitatant to add too much complexity here. Delayed free is to add
extra randomization to the re-use of chunks. Detecting double free is
just an extra. If it can be done cheap, good. But otherwise...

BTW, you hash function causes too much collisions of chunks, since the n
lower bits of them are zero. Probably best to shift them right by
info->shift bits instead of MALLOC_MINSHIFT.

-Otto



Re: malloc.c: better double free check

2017-09-24 Thread Daniel Micay
> In the end all double frees still will be caught by the actual free
> code, just with a delay. The delayed free buffer double free check is
> just a way of catching it as soon as possible to make debugging
> easier.  That's the reason the originla code could just do the check
> on the slot being replaced only.
> 
> The only case that could be missed is when the chunk is given out by
> malloc in between the original free and the double free. But that 
> case never be caught in all circumstances since the delay buffer is of
> finite size.
> 
>   -Otto

True, the delay buffer currently only guarantees allocations are kept
out of circulation for one cycle since the random choice is between
previously freed allocations, never the current one.

It matters more with the other change making half of the quarantine into
a ring buffer to provide a longer guaranteed delay. I think that makes
sense as a trade-off vs. an extra bit of entropy from a 2x larger random
array for a given total quarantine size. It also improves the write-
after-free detection, especially with a configurable quarantine size,
which makes it somewhat like the ASan quarantine but with delayed
detection of write-after-free and only indirect read-after-free
detection via junk filling (i.e. if something ends up crashing /
breaking from reading junk instead of what it expected).



Re: malloc.c: better double free check

2017-09-24 Thread Otto Moerbeek
On Sat, Sep 23, 2017 at 05:19:58PM -0400, Daniel Micay wrote:

> On Sat, 2017-09-23 at 09:32 +0200, Otto Moerbeek wrote:
> > On Fri, Sep 22, 2017 at 04:35:39PM -0400, Daniel Micay wrote:
> > 
> > > A linear search works well for the current small quarantine (16) but
> > > won't work
> > > well if you ever want to have a larger / configurable quarantine
> > > size. It would
> > > also be nice to make this fast enough to enable by default.
> > > 
> > > We (CopperheadOS) use an open addressed hash table for this based on
> > > the
> > > existing hash table since we use a larger quarantine with a FIFO
> > > queue
> > > alongside the random array and a configuration size. Ideally the
> > > code would be
> > > shared with the existing hash table but I didn't want to make it
> > > into an
> > > invasive change downstream.
> > > 
> > > These are the three downstream patches for OpenBSD malloc in our
> > > copy of Bionic
> > > (Android libc), so I'd need to port them to the current upstream
> > > code to apply
> > > cleanly. They're currently applied after other changes and it's a
> > > slightly
> > > older copy of the base code (after multi-pool support, but before
> > > the canary
> > > rework since we'll need to adapt that to our needs). Can get the
> > > general idea
> > > from the patches even though they're not going to apply cleanly
> > > though.
> > > 
> > > [1] quarantine double-free detection via hash table
> > 
> > Thanks for sharing this, I'll take a look soon. 
> > 
> > Thinking a bit about this: wouldn't a closed hash table be sufficient?
> > A collision would then either be a double free, otherwise just replace
> > old with new. You'll get a O(1) lookup and insert and simpler code.
> 
> I wouldn't really want to have a random chance of missing a double-free
> even if the chance is small though.

In the end all double frees still will be caught by the actual free
code, just with a delay. The delayed free buffer double free check is
just a way of catching it as soon as possible to make debugging
easier.  That's the reason the originla code could just do the check
on the slot being replaced only.

The only case that could be missed is when the chunk is given out by
malloc in between the original free and the double free. But that 
case never be caught in all circumstances since the delay buffer is of
finite size.

-Otto



Re: malloc.c: better double free check

2017-09-23 Thread Daniel Micay
On Sat, 2017-09-23 at 09:32 +0200, Otto Moerbeek wrote:
> On Fri, Sep 22, 2017 at 04:35:39PM -0400, Daniel Micay wrote:
> 
> > A linear search works well for the current small quarantine (16) but
> > won't work
> > well if you ever want to have a larger / configurable quarantine
> > size. It would
> > also be nice to make this fast enough to enable by default.
> > 
> > We (CopperheadOS) use an open addressed hash table for this based on
> > the
> > existing hash table since we use a larger quarantine with a FIFO
> > queue
> > alongside the random array and a configuration size. Ideally the
> > code would be
> > shared with the existing hash table but I didn't want to make it
> > into an
> > invasive change downstream.
> > 
> > These are the three downstream patches for OpenBSD malloc in our
> > copy of Bionic
> > (Android libc), so I'd need to port them to the current upstream
> > code to apply
> > cleanly. They're currently applied after other changes and it's a
> > slightly
> > older copy of the base code (after multi-pool support, but before
> > the canary
> > rework since we'll need to adapt that to our needs). Can get the
> > general idea
> > from the patches even though they're not going to apply cleanly
> > though.
> > 
> > [1] quarantine double-free detection via hash table
> 
> Thanks for sharing this, I'll take a look soon. 
> 
> Thinking a bit about this: wouldn't a closed hash table be sufficient?
> A collision would then either be a double free, otherwise just replace
> old with new. You'll get a O(1) lookup and insert and simpler code.

I wouldn't really want to have a random chance of missing a double-free
even if the chance is small though.

It's a copy of the hash table used to track regions with minor tweaks so
it'd be possible to make the existing code able to handle both but it
would mean turning it into macros. I'm not sure which is the lesser evil
for just 2 uses of a fairly simple data structure.



Re: malloc.c: better double free check

2017-09-23 Thread Theo Buehler
On Sat, Sep 23, 2017 at 02:05:34PM +, Otto Moerbeek wrote:
> On Sat, Sep 23, 2017 at 05:28:57AM -0400, Ted Unangst wrote:
> 
> > Otto Moerbeek wrote:
> > > Hi,
> > > 
> > > Malloc maintains a list if 16 slots of chunks to be freed. On free a
> > > chunk is put in a random slot and the existing chunk in that slot is
> > > actually freed. Currently, the code only checks the slot selected for
> > > a double free.
> > > 
> > > This diff adds code to check all slots. It also removes the option to
> > > disable delayed free. 
> > 
> > I thought we were already doing this, so ok. :)
> 
> And here's the manpage diff.
> 
>   -Otto
> 
> Index: malloc.conf.5
> ===
> RCS file: /cvs/src/share/man/man5/malloc.conf.5,v
> retrieving revision 1.11
> diff -u -p -r1.11 malloc.conf.5
> --- malloc.conf.5 31 Oct 2016 10:07:18 -  1.11
> +++ malloc.conf.5 23 Sep 2017 14:04:14 -
> @@ -56,18 +56,11 @@ at exit.
>  This option requires the library to have been compiled with -DMALLOC_STATS in
>  order to have any effect.
>  .It Cm F
> -.Dq Freeguard .
> -Enable use after free detection.
> +.Dq Freecheck .
> +Enable more extenisve double free and use after free detection.

with
s/extenisve/extensive

ok tb (as is the .c diff)

> +All chunks in the delayed free list will be checked for double frees.
>  Unused pages on the freelist are read and write protected to
>  cause a segmentation fault upon access.
> -This will also switch off the delayed freeing of chunks,
> -reducing random behaviour but detecting double
> -.Xr free 3
> -calls as early as possible.
> -This option is intended for debugging rather than improved security
> -(use the
> -.Cm U
> -option for security).
>  .It Cm G
>  .Dq Guard .
>  Enable guard pages.
> 



Re: malloc.c: better double free check

2017-09-23 Thread Otto Moerbeek
On Sat, Sep 23, 2017 at 05:28:57AM -0400, Ted Unangst wrote:

> Otto Moerbeek wrote:
> > Hi,
> > 
> > Malloc maintains a list if 16 slots of chunks to be freed. On free a
> > chunk is put in a random slot and the existing chunk in that slot is
> > actually freed. Currently, the code only checks the slot selected for
> > a double free.
> > 
> > This diff adds code to check all slots. It also removes the option to
> > disable delayed free. 
> 
> I thought we were already doing this, so ok. :)

And here's the manpage diff.

-Otto

Index: malloc.conf.5
===
RCS file: /cvs/src/share/man/man5/malloc.conf.5,v
retrieving revision 1.11
diff -u -p -r1.11 malloc.conf.5
--- malloc.conf.5   31 Oct 2016 10:07:18 -  1.11
+++ malloc.conf.5   23 Sep 2017 14:04:14 -
@@ -56,18 +56,11 @@ at exit.
 This option requires the library to have been compiled with -DMALLOC_STATS in
 order to have any effect.
 .It Cm F
-.Dq Freeguard .
-Enable use after free detection.
+.Dq Freecheck .
+Enable more extenisve double free and use after free detection.
+All chunks in the delayed free list will be checked for double frees.
 Unused pages on the freelist are read and write protected to
 cause a segmentation fault upon access.
-This will also switch off the delayed freeing of chunks,
-reducing random behaviour but detecting double
-.Xr free 3
-calls as early as possible.
-This option is intended for debugging rather than improved security
-(use the
-.Cm U
-option for security).
 .It Cm G
 .Dq Guard .
 Enable guard pages.



Re: malloc.c: better double free check

2017-09-23 Thread Ted Unangst
Otto Moerbeek wrote:
> Hi,
> 
> Malloc maintains a list if 16 slots of chunks to be freed. On free a
> chunk is put in a random slot and the existing chunk in that slot is
> actually freed. Currently, the code only checks the slot selected for
> a double free.
> 
> This diff adds code to check all slots. It also removes the option to
> disable delayed free. 

I thought we were already doing this, so ok. :)



Re: malloc.c: better double free check

2017-09-23 Thread Otto Moerbeek
On Fri, Sep 22, 2017 at 04:35:39PM -0400, Daniel Micay wrote:

> A linear search works well for the current small quarantine (16) but won't 
> work
> well if you ever want to have a larger / configurable quarantine size. It 
> would
> also be nice to make this fast enough to enable by default.
> 
> We (CopperheadOS) use an open addressed hash table for this based on the
> existing hash table since we use a larger quarantine with a FIFO queue
> alongside the random array and a configuration size. Ideally the code would be
> shared with the existing hash table but I didn't want to make it into an
> invasive change downstream.
> 
> These are the three downstream patches for OpenBSD malloc in our copy of 
> Bionic
> (Android libc), so I'd need to port them to the current upstream code to apply
> cleanly. They're currently applied after other changes and it's a slightly
> older copy of the base code (after multi-pool support, but before the canary
> rework since we'll need to adapt that to our needs). Can get the general idea
> from the patches even though they're not going to apply cleanly though.
> 
> [1] quarantine double-free detection via hash table

Thanks for sharing this, I'll take a look soon. 

Thinking a bit about this: wouldn't a closed hash table be sufficient?
A collision would then either be a double free, otherwise just replace
old with new. You'll get a O(1) lookup and insert and simpler code.

-Otto

> 
> diff --git a/libc/bionic/omalloc.c b/libc/bionic/omalloc.c
> index 23af59501..4f4e1290c 100644
> --- a/libc/bionic/omalloc.c
> +++ b/libc/bionic/omalloc.c
> @@ -157,6 +157,7 @@ struct dir_info {
>   struct region_info free_regions[MALLOC_MAXCACHE];
>   /* delayed free chunk slots */
>   void *delayed_chunks[MALLOC_DELAYED_CHUNK_MASK + 1];
> + void *delayed_chunks_set[(MALLOC_DELAYED_CHUNK_MASK + 1) * 4];
>   size_t rbytesused;  /* random bytes used */
>   char *func; /* current function */
>   int mutex;
> @@ -292,6 +293,22 @@ hash(void *p)
>   return sum;
>  }
>  
> +static inline size_t
> +hash_chunk(void *p)
> +{
> + size_t sum;
> + uintptr_t u;
> +
> + u = (uintptr_t)p >> MALLOC_MINSHIFT;
> + sum = u;
> + sum = (sum << 7) - sum + (u >> 16);
> +#ifdef __LP64__
> + sum = (sum << 7) - sum + (u >> 32);
> + sum = (sum << 7) - sum + (u >> 48);
> +#endif
> + return sum;
> +}
> +
>  static inline
>  struct dir_info *getpool(void)
>  {
> @@ -983,6 +1000,56 @@ delete(struct dir_info *d, struct region_info *ri)
>   }
>  }
>  
> +void delayed_chunks_insert(struct dir_info *d, void *p)
> +{
> + size_t index;
> + size_t mask = sizeof(d->delayed_chunks_set) / sizeof(void *) - 1;
> + void *q;
> +
> + index = hash_chunk(p) & mask;
> + q = d->delayed_chunks_set[index];
> + while (q != NULL) {
> + if (p == q)
> + wrterror(d, "double free", p);
> + index = (index - 1) & mask;
> + q = d->delayed_chunks_set[index];
> + }
> + d->delayed_chunks_set[index] = p;
> +}
> +
> +void delayed_chunks_delete(struct dir_info *d, void *p)
> +{
> + size_t mask = sizeof(d->delayed_chunks_set) / sizeof(void *) - 1;
> + size_t i, j, r;
> + void *q;
> +
> + i = hash_chunk(p) & mask;
> + q = d->delayed_chunks_set[i];
> + while (q != p) {
> + if (q == NULL)
> + wrterror(d, "pointer missing from address tracking 
> table", p);
> + i = (i - 1) & mask;
> + q = d->delayed_chunks_set[i];
> + }
> +
> + for (;;) {
> + d->delayed_chunks_set[i] = NULL;
> + j = i;
> + for (;;) {
> + i = (i - 1) & mask;
> + if (d->delayed_chunks_set[i] == NULL)
> + return;
> + r = hash_chunk(d->delayed_chunks_set[i]) & mask;
> + if ((i <= r && r < j) || (r < j && j < i) ||
> + (j < i && i <= r))
> + continue;
> + d->delayed_chunks_set[j] = d->delayed_chunks_set[i];
> + break;
> + }
> + }
> +}
> +
> +
>  /*
>   * Allocate a page of chunks
>   */
> @@ -1474,11 +1541,21 @@ ofree(struct dir_info *argpool, void *p)
>   if (!mopts.malloc_freenow) {
>   if (find_chunknum(pool, r, p) == (uint32_t)-1)
>   goto done;
> +
> + if (p == NULL)
> + goto done;
> +
> + delayed_chunks_insert(pool, p);
> +
>   i = getrbyte(pool) & MALLOC_DELAYED_CHUNK_MASK;
>   tmp = p;
>   p = pool->delayed_chunks[i];
> - if (tmp == p)
> - wrterror(pool, "double free", p);
> +
> + if 

Re: malloc.c: better double free check

2017-09-22 Thread Daniel Micay
A linear search works well for the current small quarantine (16) but won't work
well if you ever want to have a larger / configurable quarantine size. It would
also be nice to make this fast enough to enable by default.

We (CopperheadOS) use an open addressed hash table for this based on the
existing hash table since we use a larger quarantine with a FIFO queue
alongside the random array and a configuration size. Ideally the code would be
shared with the existing hash table but I didn't want to make it into an
invasive change downstream.

These are the three downstream patches for OpenBSD malloc in our copy of Bionic
(Android libc), so I'd need to port them to the current upstream code to apply
cleanly. They're currently applied after other changes and it's a slightly
older copy of the base code (after multi-pool support, but before the canary
rework since we'll need to adapt that to our needs). Can get the general idea
from the patches even though they're not going to apply cleanly though.

[1] quarantine double-free detection via hash table

diff --git a/libc/bionic/omalloc.c b/libc/bionic/omalloc.c
index 23af59501..4f4e1290c 100644
--- a/libc/bionic/omalloc.c
+++ b/libc/bionic/omalloc.c
@@ -157,6 +157,7 @@ struct dir_info {
struct region_info free_regions[MALLOC_MAXCACHE];
/* delayed free chunk slots */
void *delayed_chunks[MALLOC_DELAYED_CHUNK_MASK + 1];
+   void *delayed_chunks_set[(MALLOC_DELAYED_CHUNK_MASK + 1) * 4];
size_t rbytesused;  /* random bytes used */
char *func; /* current function */
int mutex;
@@ -292,6 +293,22 @@ hash(void *p)
return sum;
 }
 
+static inline size_t
+hash_chunk(void *p)
+{
+   size_t sum;
+   uintptr_t u;
+
+   u = (uintptr_t)p >> MALLOC_MINSHIFT;
+   sum = u;
+   sum = (sum << 7) - sum + (u >> 16);
+#ifdef __LP64__
+   sum = (sum << 7) - sum + (u >> 32);
+   sum = (sum << 7) - sum + (u >> 48);
+#endif
+   return sum;
+}
+
 static inline
 struct dir_info *getpool(void)
 {
@@ -983,6 +1000,56 @@ delete(struct dir_info *d, struct region_info *ri)
}
 }
 
+void delayed_chunks_insert(struct dir_info *d, void *p)
+{
+   size_t index;
+   size_t mask = sizeof(d->delayed_chunks_set) / sizeof(void *) - 1;
+   void *q;
+
+   index = hash_chunk(p) & mask;
+   q = d->delayed_chunks_set[index];
+   while (q != NULL) {
+   if (p == q)
+   wrterror(d, "double free", p);
+   index = (index - 1) & mask;
+   q = d->delayed_chunks_set[index];
+   }
+   d->delayed_chunks_set[index] = p;
+}
+
+void delayed_chunks_delete(struct dir_info *d, void *p)
+{
+   size_t mask = sizeof(d->delayed_chunks_set) / sizeof(void *) - 1;
+   size_t i, j, r;
+   void *q;
+
+   i = hash_chunk(p) & mask;
+   q = d->delayed_chunks_set[i];
+   while (q != p) {
+   if (q == NULL)
+   wrterror(d, "pointer missing from address tracking 
table", p);
+   i = (i - 1) & mask;
+   q = d->delayed_chunks_set[i];
+   }
+
+   for (;;) {
+   d->delayed_chunks_set[i] = NULL;
+   j = i;
+   for (;;) {
+   i = (i - 1) & mask;
+   if (d->delayed_chunks_set[i] == NULL)
+   return;
+   r = hash_chunk(d->delayed_chunks_set[i]) & mask;
+   if ((i <= r && r < j) || (r < j && j < i) ||
+   (j < i && i <= r))
+   continue;
+   d->delayed_chunks_set[j] = d->delayed_chunks_set[i];
+   break;
+   }
+   }
+}
+
+
 /*
  * Allocate a page of chunks
  */
@@ -1474,11 +1541,21 @@ ofree(struct dir_info *argpool, void *p)
if (!mopts.malloc_freenow) {
if (find_chunknum(pool, r, p) == (uint32_t)-1)
goto done;
+
+   if (p == NULL)
+   goto done;
+
+   delayed_chunks_insert(pool, p);
+
i = getrbyte(pool) & MALLOC_DELAYED_CHUNK_MASK;
tmp = p;
p = pool->delayed_chunks[i];
-   if (tmp == p)
-   wrterror(pool, "double free", p);
+
+   if (p == NULL)
+   goto done;
+
+   delayed_chunks_delete(pool, p);
+
if (mopts.malloc_junk)
validate_junk(pool, p);
pool->delayed_chunks[i] = tmp;
@@ -2264,6 +2341,7 @@ malloc_dump(int fd, struct dir_info *pool)
r = find(pool, p);
if (r == NULL)
wrterror(pool, "bogus pointer in 

malloc.c: better double free check

2017-09-22 Thread Otto Moerbeek
Hi,

Malloc maintains a list if 16 slots of chunks to be freed. On free a
chunk is put in a random slot and the existing chunk in that slot is
actually freed. Currently, the code only checks the slot selected for
a double free.

This diff adds code to check all slots. It also removes the option to
disable delayed free. 

For now, this extra check is only enabled with F. F is also added to S.
As a bonus, the chunk canary corruption check wil also hint at a
double free if applicable.

Please review and test. malloc.conf diff will come later.

-Otto

Index: malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.231
diff -u -p -r1.231 malloc.c
--- malloc.c12 Sep 2017 18:36:30 -  1.231
+++ malloc.c19 Sep 2017 13:04:47 -
@@ -179,7 +179,7 @@ struct chunk_info {
 struct malloc_readonly {
struct dir_info *malloc_pool[_MALLOC_MUTEXES];  /* Main bookkeeping 
information */
int malloc_mt;  /* multi-threaded mode? */
-   int malloc_freenow; /* Free quickly - disable chunk rnd */
+   int malloc_freecheck;   /* Extensive double free check */
int malloc_freeunmap;   /* mprotect free pages PROT_NONE? */
int malloc_junk;/* junk fill? */
int malloc_realloc; /* always realloc? */
@@ -520,11 +520,11 @@ omalloc_parseopt(char opt)
break;
 #endif /* MALLOC_STATS */
case 'f':
-   mopts.malloc_freenow = 0;
+   mopts.malloc_freecheck = 0;
mopts.malloc_freeunmap = 0;
break;
case 'F':
-   mopts.malloc_freenow = 1;
+   mopts.malloc_freecheck = 1;
mopts.malloc_freeunmap = 1;
break;
case 'g':
@@ -605,7 +605,7 @@ omalloc_init(void)
for (; p != NULL && *p != '\0'; p++) {
switch (*p) {
case 'S':
-   for (q = "CGJ"; *q != '\0'; q++)
+   for (q = "CFGJ"; *q != '\0'; q++)
omalloc_parseopt(*q);
mopts.malloc_cache = 0;
break;
@@ -1046,10 +1046,12 @@ validate_canary(struct dir_info *d, u_ch
q = p + check_sz;
 
while (p < q) {
-   if (*p++ != SOME_JUNK) {
-   wrterror(d, "chunk canary corrupted %p %#tx@%#zx",
-   ptr, p - ptr - 1, sz);
+   if (*p != SOME_JUNK) {
+   wrterror(d, "chunk canary corrupted %p %#tx@%#zx%s",
+   ptr, p - ptr, sz, *p == SOME_FREEJUNK ?
+   " (double free?)" : "");
}
+   p++;
}
 }
 
@@ -1381,13 +1383,18 @@ ofree(struct dir_info *argpool, void *p,
unmap(pool, p, PAGEROUND(sz), clear);
delete(pool, r);
} else {
-   void *tmp;
-   int i;
-
-   /* Delayed free or canaries? Extra check */
-   if (!mopts.malloc_freenow || mopts.chunk_canaries)
-   find_chunknum(pool, r, p, mopts.chunk_canaries);
-   if (!clear && !mopts.malloc_freenow) {
+   /* Validate and optionally canary check */
+   find_chunknum(pool, r, p, mopts.chunk_canaries);
+   if (!clear) {
+   void *tmp;
+   int i;
+
+   if (mopts.malloc_freecheck) {
+   for (i = 0; i <= MALLOC_DELAYED_CHUNK_MASK; i++)
+   if (p == pool->delayed_chunks[i])
+   wrterror(pool,
+   "double free %p", p);
+   }
if (mopts.malloc_junk && sz > 0)
memset(p, SOME_FREEJUNK, sz);
i = getrbyte(pool) & MALLOC_DELAYED_CHUNK_MASK;
@@ -1395,13 +1402,11 @@ ofree(struct dir_info *argpool, void *p,
p = pool->delayed_chunks[i];
if (tmp == p)
wrterror(pool, "double free %p", tmp);
+   pool->delayed_chunks[i] = tmp;
if (mopts.malloc_junk)
validate_junk(pool, p);
-   pool->delayed_chunks[i] = tmp;
-   } else {
-   if ((clear || mopts.malloc_junk) && sz > 0)
-   memset(p, clear ? 0 : SOME_FREEJUNK, sz);
-   }
+   } else if (sz > 0)
+   memset(p, 0, sz);
if (p != NULL) {
r = find(pool, p);
if (r == NULL)
@@ -2348,7