On Thu, Jan 18, 2018 at 12:05:48PM +0100, Otto Moerbeek wrote:
> On Thu, Jan 18, 2018 at 10:48:09AM +0000, kshe wrote:
>
> > On Thu, 18 Jan 2018 08:54:21 +0000, Otto Moerbeek wrote:
> > > Looking back the rotor thing is ill-convceived indeed. I'm now
> > > testing the diff below. I still re-use r, because I really think a
> > > little bit of correlation does not hurt here.
> >
> > Actually, I think it does hurt, because it introduces a lot of
> > regularity in the allocation patterns. This could be exploited by an
> > attacker in some way, as it makes it much easier to reason about the
> > precise layout of those pages than an unbiased distribution would. For
> > example, for chunks of size MALLOC_MAXCHUNK / 2, there should be 24
> > different ways to fill a given page, but currently only four of them are
> > possible: 0123, 1320, 2103 and 3012. This is even worse in the new diff
> > that you suggest, where the only possible configurations are the most
> > trivial ones: 0123, 1230, 2301 and 3012.
>
> Hmmm, at last I see your point. Somehow I developed a blind spot for
> this. Thanks for pointing that out.
>
> >
> > Likewise, for all the other sizes (except MALLOC_MAXCHUNK of course),
> > the number of possible patterns that can arise from the current code (or
> > the one that you suggest) is considerably lower than the expected n!
> > permutations that would be generated by an unbiased approach. Moreover,
> > these patterns tend to be very regular and easy to predict, as seen in
> > the above examples.
> >
> > Are random bytes really so expansive that it justifies such intentional
> > biases to be introduced? I actually thought they were quite cheap,
> > especially seeing as, on average, more than 25% of them are merely
> > discarded (and never used) for the sole purpose of introducing noise in
> > the arc4random_buf(3) calls...
> >
> > So what I would suggest is unconditionally initialising `r' with two
> > getrbyte() calls. For all architectures, this gives enough random bits
> > to choose the list to use as well as the offset to start at, both in a
> > perfectly uniform way, without any correlation nor bias of any kind, and
> > without the need for accumulating bits in an extra rotor variable or
> > doing some other conditional expansion of the resulting offset when
> > there are not enough bits in it. But, indeed, this method includes this
> > one additional getrbyte() call compared to the current one. So the
> > question is: is that really a problem?
> >
> > (By the way, your diff was also off by one in the check against
> > UCHAR_MAX, but this is a different issue.)
> >
> > --- malloc.c.orig Sun Jan 14 11:18:10 2018
> > +++ malloc.c Thu Jan 18 09:45:39 2018
> > @@ -172,7 +172,6 @@ struct chunk_info {
> > u_short free; /* how many free chunks */
> > u_short total; /* how many chunks */
> > u_short offset; /* requested size table offset */
> > - u_short rotor; /* randomization rotor */
> > u_short bits[1]; /* which chunks are free */
> > };
> >
> > @@ -941,7 +940,7 @@ malloc_bytes(struct dir_info *d, size_t size, void *f)
> >
> > j = find_chunksize(size);
> >
> > - r = getrbyte(d);
> > + r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
> > listnum = r % MALLOC_CHUNK_LISTS;
> > /* If it's empty, make a page more of that size chunks */
> > if ((bp = LIST_FIRST(&d->chunk_dir[j][listnum])) == NULL) {
> > @@ -953,9 +952,7 @@ malloc_bytes(struct dir_info *d, size_t size, void *f)
> > if (bp->canary != (u_short)d->canary1)
> > wrterror(d, "chunk info corrupted");
> >
> > - if (bp->free > 1)
> > - bp->rotor += r;
> > - i = bp->rotor++ & (bp->total - 1);
> > + i = (r / MALLOC_CHUNK_LISTS) & (bp->total - 1);
> >
> > /* start somewhere in a short */
> > lp = &bp->bits[i / MALLOC_BITS];
> >
> > Regards,
> >
> > kshe
>
> I'll do some benchmarks.
Diff committed. The random calls have a neglible effect on performance.
-Otto