Re: hardened memory allocate port to linux-fedora system for secutiry

2022-09-06 Thread Siddhesh Poyarekar
On Sat, Aug 27, 2022 at 9:14 AM Carlos O'Donell  wrote:
> (2) Switching the default vs. improving the default.
>

A third option (or maybe it's an improvement to the default?), since
the choice of allocators seems to come up consistently, could be to
consider seriously (and is likely not a trivial project) the idea of
making it easier to switch allocators.

However to echo what Timothée said, there's value in packaging
hardened_malloc for Fedora to make it available to users.  It's much
too early to start talking about switching defaults IMO.

Sid
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: hardened memory allocate port to linux-fedora system for secutiry

2022-08-30 Thread Timothée Ravier
I think that the first steps here would be to:
- package it in Fedora
- write a documentation page on how to use it (the quick docs may be a good 
place: https://docs.fedoraproject.org/en-US/quick-docs/)
- do a lot of testing and benchmarks to get memory and performance numbers for 
each major Fedora use case (workstation, server, IoT, etc.)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: hardened memory allocate port to linux-fedora system for secutiry

2022-08-29 Thread Daniel Micay via devel
On Mon, Aug 15, 2022 at 07:39:46PM -0700, John Reiser wrote:
> On 8/13/22, Demi Marie Obenour wrote:
> > On 8/13/22, Kevin Kofler via devel wrote:
> > > martin luther wrote:
> > > > should we implement https://github.com/GrapheneOS/hardened_malloc/
> > > > it is hardened memory allocate it will increase the security of fedora
> > > > according to the graphene os team it can be ported to linux as well need
> > > > to look at it
> > 
> > CCing Daniel Micay who wrote hardened_malloc.
> > 
> > > There are several questions that come up:  [[snip]]
> 
> It seems to me that hardened_malloc could increase working set and RAM
> desired by something like 10% compared to glibc for some important workloads,
> such as Fedora re-builds.  From page 22 of [1] (attached here; 203KB), the 
> graph
> of number of requests versus requested size shows that blocks of size <= 128
> were requested tens to thousands of times more often than all the rest.

The lightweight configuration, hardened_malloc uses substantially less
memory for small allocations than glibc malloc.

None of the GrapheneOS or hardened_malloc developers or project members
has proposed that Fedora switch to hardened_malloc, but it would reduce
rather than increasing memory usage if you used in without the slab
quarantine features. Slab canaries use extra memory too, but the
overhead is lower than glibc metadata overhead. The sample lightweight
configuration still uses slab canaries.

If you bolted on a jemalloc-style array-based thread cache or a
problematic TCMalloc-style one as was copied for glibc, then you would
be able to get comparable performance and better scalability than glibc
malloc, but that is outside the scope of what hardened_malloc is
intended to provide. We aren't trying to serve that niche in
hardened_malloc. It does not mean that glibc malloc is well suited to
being the chosen allocator. That really can't be justified for any
technical reasons. If you replaced glibc malloc with jemalloc, the only
people who would be unhappy are people who care about the loss of ASLR
bits from chunk alignment, which if you make the chunks small enough and
configure ASLR properly really doesn't matter on 64-bit. I can't think
of a case where glibc malloc would be better than jemalloc with small
chunk sizes when using either 4k pages with a 48-bit address space or
larger pages. glibc malloc's overall design is simply not competitive
anymore, and it wastes tons of memory from both metadata overhead and
also fragmentation. I can't really understand what justification there
would be for not replacing it outright with a more modern design and
adding the necessary additional APIs required for that as we did
ourselves for our own security-focused allocator.

> For sizes from 0 through 128, the "Size classes" section of README.md of [2]
> documents worst-case internal fragmentation (in "slabs") of 93.75% to 11.72%.
> That seems too high.  Where are actual measurements for workloads such as
> Fedora re-builds?

The minimum alignment is 16 bytes. glibc malloc has far more metadata
overhead, internal and external fragmentation than hardened_malloc in
reality. It has headers on allocations, rounds to much less fine grained
bucket sizes and fragments all the memory with the traditional dlmalloc
style approach. There was a time when that approach was a massive
improvement over past ones but that time was the 90s, not 2022.

> (Also note that the important special case of malloc(0), which is analogous
> to (gensym) of Lisp and is implemented internally as malloc(1), consumes
> 16 bytes and has a fragmentation of 93.75% for both glibc and hardened_malloc.
> The worst fragmentation happens for *every* call to malloc(0), which occurred
> about 800,000 times in the sample.  Yikes!)

glibc malloc has headers giving it more than 100% pure overhead for a 16
byte allocation. It cannot do finer grained rounding than we do for 16
through 128 bytes, and sticking headers on allocations makes it far
worse. It also gets even worse with aligned allocations, such as common
64 byte aligned allocations, where slab allocation means any allocation
up to the page size already has their natural alignment such as 64 byte
for 64 byte, 128 byte for 128 byte, 256 byte for 256 byte, etc.

0 byte doesn't really make sense to compare because in hardened_malloc
it's a pointer to non-allocated pages with PROT_NONE memory protection.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: hardened memory allocate port to linux-fedora system for secutiry

2022-08-27 Thread Carlos O'Donell
On 8/26/22 12:22, Daniel Micay via devel wrote:
> Also, you hardened_malloc doesn't use a thread cache for security
> reasons. It invalidates many of the security properties. If you compare
> to glibc malloc in the light configuration with tcache disabled in glibc
> malloc it will compare well, and hardened_malloc can scale better when
> given enough arenas. If you want to make the substantial security
> sacrifices required for a traditional thread cache, then I don't think
> hardened_malloc makes sense, which is why it doesn't include the option
> to do thread caching even though it'd be easy to implement. It may one
> day include the option to do thread batched allocation, but it isn't
> feasible to do it for deallocation without losing a ton of the strong
> security properties.

I'm an upstream glibc developer, but I've tried to remove my bias here and 
present
the facts as they are for the existing heap-based allocator that is in use by 
the
distributions today and why it's hard to change.

(1) Pick your own allocator vs. use the default.

We allow any end user to make those choices by interposing the final allocator 
with
an allocator of their choice depending on specific workload criteria. This means
that distributions don't have a strong incentive to change system allocators 
unless
they are making a strategic change in their core values or vision for the 
distribution
(like Graphene OS makes for security).

At the ELF level we make sure that we can interpose a new allocator, and we work
carefully to ensure that newer features at the compiler level can be supported
incrementally (_FORTIFIY_SOURCE=3 and __builtin_dynamic_object_size) by newer
allocators.

In summary: If the "good enough" allocator doesn't meet your requirements, then
you can use one of the alternatives.

(2) Switching the default vs. improving the default.

It is arguably lower TCO for all distributions using glibc to improve glibc's
malloc. Some improvements can't be made, but some buy enough benefit that there
is no strong reason to change allocators.

For example:
- jemalloc/tcmalloc used a fast per-thread cache.
  - glibc implemented fast per-thread caching in 2.26 (2017) (DJ Delorie's work)

- Chromium started using safe-linking pointer hardening.
  - glibc implemented safe-linking pointer hardening for fastbins and tcache 
(2020) (Eyal Itkin's work)

Next steps for glibc's malloc is probably:

- Improve internal fragmentation [1]
- Round-robin arena assignment with uniform arena assignment as a goal.
- Provide a packed arena for sub 16-byte sized allocations to improve 
utilization.
 - We have seen some C++ workloads/frameworks that create trillions of 13-byte 
objects.

(3) Requirements vs. change.

While Facebook/BSD (jemalloc), Google (tcmalloc), Microsoft (mimalloc) have very
good allocators, issues seen with those allocators can be more difficult to
correct because of the impact those changes have on wider workloads beyond
distribution workloads.

For example if Graphene OS, with it's own goals, and Fedora with it's own goals
had a conflict of interest for the direction of the allocator e.g. cost vs.
security, what kind of choice would the hardened_allocator maintainers make?

Upstream glibc has largely been aligned with traditional distribution
requirements for a long time, and continues to be aligned with the notion
of a "general purpose" distribution via the contributors and deep network
of developers in the distributions:
https://sourceware.org/glibc/wiki/MAINTAINERS#Distribution_Maintainers

---

The combination of (1), (2) and (3) mean that for general purpose
distributions the choice of staying with glibc's malloc means having
an ecosystem of distributions that are using the same allocator and
benefit from wide application testing and development and support
when required.

It would be easier to approach glibc upstream and convince them that the
default allocator in glibc should be replaced with hardened_alloc or
jemalloc or tcmalloc or mimalloc...

-- 
Cheers,
Carlos.

[1] 
https://patchwork.sourceware.org/project/glibc/patch/xn4jz19fts@greed.delorie.com/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: hardened memory allocate port to linux-fedora system for secutiry

2022-08-26 Thread Daniel Micay via devel
On Mon, Aug 15, 2022 at 07:39:46PM -0700, John Reiser wrote:
> On 8/13/22, Demi Marie Obenour wrote:
> > On 8/13/22, Kevin Kofler via devel wrote:
> > > martin luther wrote:
> > > > should we implement https://github.com/GrapheneOS/hardened_malloc/
> > > > it is hardened memory allocate it will increase the security of fedora
> > > > according to the graphene os team it can be ported to linux as well need
> > > > to look at it
> > 
> > CCing Daniel Micay who wrote hardened_malloc.
> > 
> > > There are several questions that come up:  [[snip]]
> 
> It seems to me that hardened_malloc could increase working set and RAM
> desired by something like 10% compared to glibc for some important workloads,
> such as Fedora re-builds.  From page 22 of [1] (attached here; 203KB), the 
> graph
> of number of requests versus requested size shows that blocks of size <= 128
> were requested tens to thousands of times more often than all the rest.

It has far less fragmentation than glibc malloc. It also has far lower
metadata overhead since there are no headers on allocations and only a
few bits consumed per small allocation. glibc has over 100% metadata
overhead for 16 byte allocations while for hardened_malloc it's a very
low percentage. Of course, you need to compare with slab allocation
quarantines and slab allocation canaries disabled in hardened_malloc.

> For sizes from 0 through 128, the "Size classes" section of README.md of [2]
> documents worst-case internal fragmentation (in "slabs") of 93.75% to 11.72%.
> That seems too high.  Where are actual measurements for workloads such as
> Fedora re-builds?

Internal malloc means fragmentation caused by size class rounding. There
is no way to have size classes that aren't multiples of 16 due to it
being required by the x86_64 and arm64 ABI. glibc has over 100% overhead
for 16 byte allocations due to header metadata and other metadata. It
definitely isn't lighter for those compared to a modern slab allocator.

There's a 16 byte alignment requirement for malloc on x86_64 and arm64
so there's no way to have any size classes between the initial multiples
of 16.

Slab allocation canaries are an optional hardened_malloc feature adding
8 byte random canaries to the end of allocations, which in many cases
will increase the size class if there isn't room within the padding.
Slab allocation quarantines are another optional feature which require
dedicating substantial memory to avoiding reuse of allocations.

You should compare without the optional features enabled as a baseline
because glibc doesn't have any of those security features, and the
baseline hardened_malloc design is far more secure.

> (Also note that the important special case of malloc(0), which is analogous
> to (gensym) of Lisp and is implemented internally as malloc(1), consumes
> 16 bytes and has a fragmentation of 93.75% for both glibc and hardened_malloc.
> The worst fragmentation happens for *every* call to malloc(0), which occurred
> about 800,000 times in the sample.  Yikes!)

malloc(0) is not implemented as malloc(1) in hardened_malloc and does
not use any memory for the data, only the metadata, which is a small
percentage of the allocation size even for 16 byte allocations since
there is only slab metadata for the entire slab and bitmaps to track
which slots are used. There are no allocation headers.

Doing hundreds of thousands of malloc(0) allocations only uses a few
bytes of memory in hardened_malloc. Each allocation requires a bit in
the bitmap and each slab of 256x 16 byte allocations (4096 byte slab)
has slab metadata. All the metadata is in a dedicated metadata region.

I strong recommend reading all the documentation thoroughly:

https://github.com/GrapheneOS/hardened_malloc/blob/main/README.md

hardened_malloc is oriented towards security and provides a bunch of
important security properties unavailable with glibc malloc. It also has
lower fragmentation and with the optional security features disabled
also lower memory usage for large processes and especially over time. If
you enable the slab quarantines, that's going to use a lot of memory. If
you enable slab canaries, you give up some of the memory usage reduction
from not having per-allocation metadata headers. Neither of those
features exists in glibc malloc, jemalloc, etc. so it's not really fair
to enable the optional security features for hardened_malloc and compare
with allocators without them.

Slab allocation quarantines in particular inherently require a ton of
memory in order to delay reuse of allocations for as long of a time as
is feasible. This pairs well with zero-on-free + write-after-free-check
based on zero-on-free, since if any non-zero write occurs while
quarantined/freed it will be detected before the allocation is reused.
As long as zero-on-free is enabled, which it is even for the sample
light configuration, then all memory is known to be zeroed at allocation
time, which is how the write-after-free-check works. All of 

Re: hardened memory allocate port to linux-fedora system for secutiry

2022-08-13 Thread Demi Marie Obenour
On 8/13/22 08:04, Kevin Kofler via devel wrote:
> martin luther wrote:
>> should we implement https://github.com/GrapheneOS/hardened_malloc/
>> it is hardened memory allocate it will increase the security of fedora
>> according to the graphene os team it can be ported to linux as well need
>> to look at it

CCing Daniel Micay who wrote hardened_malloc.

> There are several questions that come up:
> * Against what exact threats does this protect? Use-after-free? Heap buffer 
> overflow? Others?> * How does it relate to _FORTIFY_SOURCE? Can they be used 
> together? (If not, 
> it might actually reduce rather than increase the security of Fedora.)> * How 
> does it perform, both in terms of speed and memory consumption 
> (overhead)? Better or worse than the glibc malloc? (If it is much worse than 
> the glibc malloc, it is not going to be a suitable default for Fedora.)> * 
> How does it compare to the glibc malloc in terms of quality of 
> implementation issues, such as that realloc should avoid copying the whole 
> block whenever an in-place resize is possible?
> * Can hardening be added to the existing glibc malloc implementation or is a 
> complete rewrite as the suggested one really needed?> * How do you suggest it 
> getting used distro-wide instead of the glibc 
> implementation? Upstream's suggestion is to link it as an additional dynamic 
> shared object, so then the order of linking is important, and you also have 
> to take care to link it into all applications (and there are lots of build 
> systems out there). The alternative, I suppose, would be to modify glibc.
> 
> Kevin Kofler

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: hardened memory allocate port to linux-fedora system for secutiry

2022-08-13 Thread Kevin Kofler via devel
martin luther wrote:
> should we implement https://github.com/GrapheneOS/hardened_malloc/
> it is hardened memory allocate it will increase the security of fedora
> according to the graphene os team it can be ported to linux as well need
> to look at it

There are several questions that come up:
* Against what exact threats does this protect? Use-after-free? Heap buffer 
overflow? Others?
* How does it relate to _FORTIFY_SOURCE? Can they be used together? (If not, 
it might actually reduce rather than increase the security of Fedora.)
* How does it perform, both in terms of speed and memory consumption 
(overhead)? Better or worse than the glibc malloc? (If it is much worse than 
the glibc malloc, it is not going to be a suitable default for Fedora.)
* How does it compare to the glibc malloc in terms of quality of 
implementation issues, such as that realloc should avoid copying the whole 
block whenever an in-place resize is possible?
* Can hardening be added to the existing glibc malloc implementation or is a 
complete rewrite as the suggested one really needed?
* How do you suggest it getting used distro-wide instead of the glibc 
implementation? Upstream's suggestion is to link it as an additional dynamic 
shared object, so then the order of linking is important, and you also have 
to take care to link it into all applications (and there are lots of build 
systems out there). The alternative, I suppose, would be to modify glibc.

Kevin Kofler
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


hardened memory allocate port to linux-fedora system for secutiry

2022-08-13 Thread martin luther
should we implement https://github.com/GrapheneOS/hardened_malloc/
it is hardened memory allocate it will increase the security of fedora
according to the graphene os team it can be ported to linux as well need to 
look at it
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue