On Thu, Jan 18, 2018 at 10:48:09AM +0000, kshe wrote:
> On Thu, 18 Jan 2018 08:54:21 +0000, Otto Moerbeek wrote:
> > Looking back the rotor thing is ill-convceived indeed. I'm now
> > testing the diff below. I still re-use r, because I really think a
> > little bit of correlation does not hurt here.
>
> Actually, I think it does hurt, because it introduces a lot of
> regularity in the allocation patterns. This could be exploited by an
> attacker in some way, as it makes it much easier to reason about the
> precise layout of those pages than an unbiased distribution would. For
> example, for chunks of size MALLOC_MAXCHUNK / 2, there should be 24
> different ways to fill a given page, but currently only four of them are
> possible: 0123, 1320, 2103 and 3012. This is even worse in the new diff
> that you suggest, where the only possible configurations are the most
> trivial ones: 0123, 1230, 2301 and 3012.
Hmmm, at last I see your point. Somehow I developed a blind spot for
this. Thanks for pointing that out.
>
> Likewise, for all the other sizes (except MALLOC_MAXCHUNK of course),
> the number of possible patterns that can arise from the current code (or
> the one that you suggest) is considerably lower than the expected n!
> permutations that would be generated by an unbiased approach. Moreover,
> these patterns tend to be very regular and easy to predict, as seen in
> the above examples.
>
> Are random bytes really so expansive that it justifies such intentional
> biases to be introduced? I actually thought they were quite cheap,
> especially seeing as, on average, more than 25% of them are merely
> discarded (and never used) for the sole purpose of introducing noise in
> the arc4random_buf(3) calls...
>
> So what I would suggest is unconditionally initialising `r' with two
> getrbyte() calls. For all architectures, this gives enough random bits
> to choose the list to use as well as the offset to start at, both in a
> perfectly uniform way, without any correlation nor bias of any kind, and
> without the need for accumulating bits in an extra rotor variable or
> doing some other conditional expansion of the resulting offset when
> there are not enough bits in it. But, indeed, this method includes this
> one additional getrbyte() call compared to the current one. So the
> question is: is that really a problem?
>
> (By the way, your diff was also off by one in the check against
> UCHAR_MAX, but this is a different issue.)
>
> --- malloc.c.orig Sun Jan 14 11:18:10 2018
> +++ malloc.c Thu Jan 18 09:45:39 2018
> @@ -172,7 +172,6 @@ struct chunk_info {
> u_short free; /* how many free chunks */
> u_short total; /* how many chunks */
> u_short offset; /* requested size table offset */
> - u_short rotor; /* randomization rotor */
> u_short bits[1]; /* which chunks are free */
> };
>
> @@ -941,7 +940,7 @@ malloc_bytes(struct dir_info *d, size_t size, void *f)
>
> j = find_chunksize(size);
>
> - r = getrbyte(d);
> + r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
> listnum = r % MALLOC_CHUNK_LISTS;
> /* If it's empty, make a page more of that size chunks */
> if ((bp = LIST_FIRST(&d->chunk_dir[j][listnum])) == NULL) {
> @@ -953,9 +952,7 @@ malloc_bytes(struct dir_info *d, size_t size, void *f)
> if (bp->canary != (u_short)d->canary1)
> wrterror(d, "chunk info corrupted");
>
> - if (bp->free > 1)
> - bp->rotor += r;
> - i = bp->rotor++ & (bp->total - 1);
> + i = (r / MALLOC_CHUNK_LISTS) & (bp->total - 1);
>
> /* start somewhere in a short */
> lp = &bp->bits[i / MALLOC_BITS];
>
> Regards,
>
> kshe
I'll do some benchmarks.
-Otto