Re: hardened malloc is big and slow
On Wed, Sep 07, 2022 at 08:39:56AM -0700, John Reiser wrote: > On 9/5/22 19:45, Daniel Micay wrote: > > On Wed, Aug 31, 2022 at 10:19:51AM -0700, John Reiser wrote: > > > > Bottom line opinion: hardened_malloc ... costs too much. > > > > > > Attempting to be constructive: Psychologically, I might be willing to pay > > > a "security tax" of something like 17%, partly on the basis of similarity > > > to the VAT rate (Value Added Tax) in some parts of the developed world. > > > > The comparison is being done incorrectly. Since hardened_malloc builds > > both a lightweight and heavyweight library by default, > > That claim is false. You're not following the official approach to packaging and installing hardened_malloc. It has 2 official build configurations and packaging that's done properly includes both. We don't currently define other configurations, but we could define a 'lightest' one too. I've given both concise and detailed explanations here, which you've gone out of the way to ignore. The Makefile for commit 72fb3576f568481a03076c62df37984f96bfdfeb > on Tue Aug 16 07:47:26 2022 -0400 (which is the HEAD of the trunk) begins > = > VARIANT := default > > ifneq ($(VARIANT),) > CONFIG_FILE := config/$(VARIANT).mk > include config/$(VARIANT).mk > endif > > ifeq ($(VARIANT),default) > SUFFIX := > else > SUFFIX := -$(VARIANT) > endif > > OUT := out$(SUFFIX) > = > and builds only one library, namely $OUT/libhardened_malloc$SUFFIX.so > which for the case of "no options specified" is out/libhardened_malloc.so . > > If would be better for external perception if the name "libhardened_malloc.so" > were changed to something like "libhardened_malloc-strong.so". > Having both -strong and -light versions built every time > would highlight the difference, and force the user to decide, > and encourage the analysis that is required to make an informed choice. The 2 default configurations are not the only choices. The light configuration still has full zero-on-free and canaries enabled. If we felt like matching or even exceeding glibc malloc performance on microbenchmarks we could add an optional thread cache and a performance configuration but it's not the point of the project at all, and glibc malloc is not a high performance allocator. hardened_malloc can provide similar performance with all optional features disabled vs. glibc malloc with tcache disabled. If hardened_malloc had array-based thread caching added (free lists would lose even the very basic 100% out-of-line metadata security property) then with optional features disabled it would be comparable to default glibc malloc configuration. We're already done extensive testing. There's no thread cache included because it simply isn't within the scope of it. It's a hardened allocator, and a thread cache bypasses hardening and makes invalid free detection, randomization, quarantines, and other features not work properly. It has been tested with a thread cache. We know the impact of it. I don't think it makes sense to use it with one. > > already explained this and that the lightweight library still has > > optional security features enabled, it doesn't seem to have been done in > > good faith. My previous posts where I provided both concise and detailed > > information explaining differences and the approach were ignored. Why is > > that? > > > > As I said previously, hardened_malloc has a baseline very hardened > > allocator design. It also has entirely optional, expensive security > > features layered on top of that. I explained in detail that some of > > those features have a memory cost. Slab allocation canaries have a small > > memory cost and slab allocation quarantines have a very large memory > > cost especially with the default configuration. Those expensive optional > > features each have an added performance cost too. > > > > Measuring with 100% of the expensive optional features enabled and > > trying to portray the performance of the allocator solely based on that > > is simply incredibly misleading and disregards all of my previous posts > > in the thread. > > I measured the result of building and using with the default options. > Unpack the source, use "as-is" with no adjustment, no tweaking, no tuning. > If the default source is not appropriate to use as widely as implied > by the name "malloc" (with no prefix and no suffix on the subroutine name), > then the package is not suitable for general use. > Say so immediately at the beginning of the README.md: "This software > is not suitable for widespread general use, unless adjusted according to > the actual use cases." The hardened_malloc project is perfectly suitable for general purpose use and heavily preferring security over both performance and memory usage for one of the 2 default configurations doesn't make it any less general purpose. The chosen compromises do not impact whether or not it is a general purpose allocator. Both default configurations are
Re: hardened malloc is big and slow
On 9/5/22 19:45, Daniel Micay wrote: On Wed, Aug 31, 2022 at 10:19:51AM -0700, John Reiser wrote: Bottom line opinion: hardened_malloc ... costs too much. Attempting to be constructive: Psychologically, I might be willing to pay a "security tax" of something like 17%, partly on the basis of similarity to the VAT rate (Value Added Tax) in some parts of the developed world. The comparison is being done incorrectly. Since hardened_malloc builds both a lightweight and heavyweight library by default, That claim is false. The Makefile for commit 72fb3576f568481a03076c62df37984f96bfdfeb on Tue Aug 16 07:47:26 2022 -0400 (which is the HEAD of the trunk) begins = VARIANT := default ifneq ($(VARIANT),) CONFIG_FILE := config/$(VARIANT).mk include config/$(VARIANT).mk endif ifeq ($(VARIANT),default) SUFFIX := else SUFFIX := -$(VARIANT) endif OUT := out$(SUFFIX) = and builds only one library, namely $OUT/libhardened_malloc$SUFFIX.so which for the case of "no options specified" is out/libhardened_malloc.so . If would be better for external perception if the name "libhardened_malloc.so" were changed to something like "libhardened_malloc-strong.so". Having both -strong and -light versions built every time would highlight the difference, and force the user to decide, and encourage the analysis that is required to make an informed choice. and since I already explained this and that the lightweight library still has optional security features enabled, it doesn't seem to have been done in good faith. My previous posts where I provided both concise and detailed information explaining differences and the approach were ignored. Why is that? As I said previously, hardened_malloc has a baseline very hardened allocator design. It also has entirely optional, expensive security features layered on top of that. I explained in detail that some of those features have a memory cost. Slab allocation canaries have a small memory cost and slab allocation quarantines have a very large memory cost especially with the default configuration. Those expensive optional features each have an added performance cost too. Measuring with 100% of the expensive optional features enabled and trying to portray the performance of the allocator solely based on that is simply incredibly misleading and disregards all of my previous posts in the thread. I measured the result of building and using with the default options. Unpack the source, use "as-is" with no adjustment, no tweaking, no tuning. If the default source is not appropriate to use as widely as implied by the name "malloc" (with no prefix and no suffix on the subroutine name), then the package is not suitable for general use. Say so immediately at the beginning of the README.md: "This software is not suitable for widespread general use, unless adjusted according to the actual use cases." hardened_malloc builds both a lightweight and heavyweight library by default. The lightweight library still has 2 of the optional security features enabled. None of the optional security features is provided by glibc malloc and if you want to compare the baseline performance then none of those should be enabled for a baseline comparison. Take the light configuration, disable slab allocation canaries and full zero-on-free, and there you go. I reported an end-to-end measurement and comparison based on data. Where have you reported actual end-to-end measurements and comparisons? [[snip]] ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: hardened malloc is big and slow
On 9/5/22 21:02, Daniel Micay via devel wrote: On Wed, Aug 31, 2022 at 05:59:42PM +0200, Pablo Mendez Hernandez wrote: Adding Daniel for awareness. Why was the heavyweight rather than lightweight configuration used? Why compare with all the expensive optional security features enabled? The default configuration was used. " ...; make" produces out/libhardened_malloc.so and no other shared library. Even the lightweight configuration has 2 of the optional security features enabled: slab canaries and full zero-on-free. Both of those should be disabled to measure the baseline performance. Using the heavyweight configuration means having large slab allocation quarantines and not just zero-on-free but checking that data is still zeroed on allocation (which more than doubles the cost), slot randomization and multiple other features. It just doesn't make sense to turn security up to 11 with optional features and then present that as if it's the performance offered. The use case is a builder and distributor of software packages to a large, diverse audience. There is concern about the possibility of malware attacking the build process, a "supply-chain attack". Of course there are other protections already in place, but the possibility of better protection is reasonable to investigate. A network search revealed a dearth of end-to-end performance measurements, and/or comparisons based on actual data. I'm here to provide clarifications about my project and to counter incorrect beliefs about it. I don't think it makes much sense for Fedora to use it as a default allocator but the claims being made about memory usage and performance are very wrong. I already responded and provided both concise and detailed explanations. I don't know what these nonsense measurements completely disregarding all that are meant to demonstrate. I reported an actual measurement and comparison of two allocators using commonly-available tools and a documented, repeatable methodology. The choice of which two allocators is reasonable for the use case. > [[snip]] ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: hardened malloc is big and slow
On Mon, 2022-09-05 at 22:45 -0400, Daniel Micay via devel wrote: > The comparison is being done incorrectly. Since hardened_malloc > builds > both a lightweight and heavyweight library by default, and since I > already explained this and that the lightweight library still has > optional security features enabled, it doesn't seem to have been done > in > good faith. My previous posts where I provided both concise and > detailed > information explaining differences and the approach were ignored. Why > is > that? I agree. I decided to do a more fair test myself (I'm quite interested in hardened_malloc). First, I downloaded the source RPM for my current kernel: dnf download --source kernel-5.19.6-200.fc36.x86_64 Then made both heavy and light variants: sysctl -p /etc/sysctl.d/hardened_malloc.conf make VARIANT=light Setup the chroot: mock -r fedora-36-x86_64 --init Create our SRPM: mock -r fedora-36-x86_64 --buildsrpm --spec kernel.spec --sources $PWD --resultdir $PWD Now do the compilations: cp out-light/libhardened_malloc.so . ./preload.sh /usr/bin/time mock -r fedora-36-x86_64 --rebuild kernel- 5.19.6-200.fc36.src.rpm >light.out 2>&1 /usr/bin/time mock -r fedora-36-x86_64 --rebuild kernel-5.19.6- 200.fc36.src.rpm >no_preload.out 2>&1 ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: hardened malloc is big and slow
On Wed, Aug 31, 2022 at 05:59:42PM +0200, Pablo Mendez Hernandez wrote: > Adding Daniel for awareness. Why was the heavyweight rather than lightweight configuration used? Why compare with all the expensive optional security features enabled? Even the lightweight configuration has 2 of the optional security features enabled: slab canaries and full zero-on-free. Both of those should be disabled to measure the baseline performance. Using the heavyweight configuration means having large slab allocation quarantines and not just zero-on-free but checking that data is still zeroed on allocation (which more than doubles the cost), slot randomization and multiple other features. It just doesn't make sense to turn security up to 11 with optional features and then present that as if it's the performance offered. I'm here to provide clarifications about my project and to counter incorrect beliefs about it. I don't think it makes much sense for Fedora to use it as a default allocator but the claims being made about memory usage and performance are very wrong. I already responded and provided both concise and detailed explanations. I don't know what these nonsense measurements completely disregarding all that are meant to demonstrate. It's a huge hassle for me to respond here because I have no interest in this list and don't want to be subscribed to it. I didn't propose that Fedora uses it and don't think it makes sense for Fedora. At the same time I already explained that glibc malloc is ALSO a very bad choice in detail. Linux distributions not willing to sacrifice much for security would be better served by using jemalloc with small chunk sizes on 64 bit operating systems. ASLR is too low entropy on 32 bit to afford the sacrifice of a few bits for chunk alignment though. It can be configured with extra sanity checks enabled and with certain very non-essential features disabled to provide a better balance of security vs. performance. The defaults are optimized for long running server processes. It's very configurable, including by individual applications. hardened_malloc builds both a lightweight and heavyweight library itself. The lightweight library still has the optional slab allocation canary and full zero-on-free features enabled. Both those should be disabled to truly measure the baseline cost. None of those optional features is provided by glibc malloc. None of them is needed to get the benefits of hardened_malloc's 100% out-of-line metadata, 100% invalid free detection, entirely separate never reused address space regions for all allocator metadata and each slab allocation size class (which covers up to 128k by default), virtual memory quarantines + random guards for large allocations, etc. etc. The optional security features are optional because they're expensive. That's the point of building both a sample lightweight and heavyweight configuration by default. Lightweight configuration is essentially the recommended configuration if you aren't willing to make more significant sacrifices for security. It's not the highest performance configuration it offers, just a reasonable compromise. Slab allocation canaries slightly increase memory usage. Slab allocation quarantines (disabled in lightweight configuration, which is built by default) greatly increase memory usage, especially with the default configuration. The whole point of quarantines is that they delay reuse of the memory and since these are slab allocations within slabs the memory gets held onto. If you wanted to do measure the baseline performance, then you'd do as I suggested and measure with all the optional features disabled (disable at least those 2 features included in optional) and compare that to both glibc malloc and glibc malloc with tcache disabled. I explained previously that hardened_malloc could provide an array-based thread cache as an opt-in feature, but currently it isn't done because it inherently reduces security. No more 100% reliable detection of all invalid frees and a lot more security properties lost. Also hardly makes sense to have optional features like quarantines and slot randomization underneath unless the thread caches are doing the same thing. As I said previously, if you compare hardened_malloc with optional features disabled to glibc malloc with tcache disabled, it performs as well and has much lower fragmentation and lower metadata overhead. If you stick a small array-based thread cache onto hardened_malloc, then it can perform as well as glibc with much larger freelist-based thread caches since it has a different approach to scaling with jemalloc-style arenas. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
Re: hardened malloc is big and slow
On Wed, Aug 31, 2022 at 10:19:51AM -0700, John Reiser wrote: > > Bottom line opinion: hardened_malloc ... costs too much. > > Attempting to be constructive: Psychologically, I might be willing to pay > a "security tax" of something like 17%, partly on the basis of similarity > to the VAT rate (Value Added Tax) in some parts of the developed world. The comparison is being done incorrectly. Since hardened_malloc builds both a lightweight and heavyweight library by default, and since I already explained this and that the lightweight library still has optional security features enabled, it doesn't seem to have been done in good faith. My previous posts where I provided both concise and detailed information explaining differences and the approach were ignored. Why is that? As I said previously, hardened_malloc has a baseline very hardened allocator design. It also has entirely optional, expensive security features layered on top of that. I explained in detail that some of those features have a memory cost. Slab allocation canaries have a small memory cost and slab allocation quarantines have a very large memory cost especially with the default configuration. Those expensive optional features each have an added performance cost too. Measuring with 100% of the expensive optional features enabled and trying to portray the performance of the allocator solely based on that is simply incredibly misleading and disregards all of my previous posts in the thread. hardened_malloc builds both a lightweight and heavyweight library by default. The lightweight library still has 2 of the optional security features enabled. None of the optional security features is provided by glibc malloc and if you want to compare the baseline performance then none of those should be enabled for a baseline comparison. Take the light configuration, disable slab allocation canaries and full zero-on-free, and there you go. I also previously explained that hardened_malloc does not include a thread cache for security reasons inherent to the concept of a thread cache. An array-based thread cache with out-of-line metadata would still hurt security, but would be a more suitable approach than a free list compromising the otherwise complete lack of inline metadata. Compare hardened_malloc with the optional security features disabled to glibc malloc and also to glibc malloc with tcache disabled. It's easy enough to stick a thread cache onto hardened_malloc and if there was demand for that I could implement it in half an hour. At the moment, the current users of hardened_malloc don't want to make the sacrifice of losing 100% reliable detection of invalid frees along with the many other benefits lost by doing that. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: hardened malloc is big and slow
Bottom line opinion: hardened_malloc ... costs too much. Attempting to be constructive: Psychologically, I might be willing to pay a "security tax" of something like 17%, partly on the basis of similarity to the VAT rate (Value Added Tax) in some parts of the developed world. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: hardened malloc is big and slow
Adding Daniel for awareness. Regards. Pablo El mié., 31 ago. 2022 16:09, John Reiser escribió: > Here is one end-to-end performance measurement of using hardened_malloc. > > sudo sh -c "echo 1 >/proc/sys/vm/drop_caches" > /usr/bin/time rpmbuild -bc kernel-5.15.11-100.fc34.spec >rpmbuild.out > 2>&1 > > For glibc, the result was > 19274.30user 2522.87system 1:49:06elapsed 332%CPU (0avgtext+0avgdata > 3389052maxresident)k > 148504inputs+217900040outputs (18221major+1005715216minor)pagefaults > 0swaps > > For the same task, but preceded by > export LD_PRELOAD=/usr/lib64/libhardened_malloc.so > the result was > 26108.73user 4805.55system 2:22:43elapsed 360%CPU (0avgtext+0avgdata > 1881564maxresident)k > 586704inputs+217900504outputs (31876major+1848825755minor)pagefaults > 0swaps > > So compared to glibc-2.33-21.fc34.x86_64, hardened_malloc used > 1.3 times as much wall clock (8563 / 6536 in seconds) > 1.35 times as much user CPU (26108 / 19274) > 1.9 times as much sys CPU ( 4805 / 2522). > > The environment was a physical machine running fedora > 5.17.12-100.fc34.x86_64: > Intel Core i5-6500 @3.2GHz (4 CPU, 4 cores, 256kB L2 cache per core, > 6MB L3 shared) > 32GB DDR4 RAM > /usr ext4 on SSD, /data ext4 on 4TB spinning commodity hard drive > > In the .spec, I changed to: > %define make_opts -j4 > so that much of the compiling ran 4 jobs in parallel. > /usr/bin/top showed minimal use of swapspace: 4MB, > > hardened_malloc required (as documented in its README.md): > - /etc/sysctl.d/hardened_malloc.conf > # (Fedora 5.17.12) default is 65530 (2**16 - 6), > # libhardened_malloc suggests 1048576 (2**20) > # we choose 1048570 (2**20 - 6) > vm.max_map_count = 1048570 > - > else the job crashed: > BTF .btf.vmlinux.bin.o > memory exhausted > > The libhardened_malloc source code version was: > commit 72fb3576f568481a03076c62df37984f96bfdfeb > of Tue Aug 16 07:47:26 2022 -0400 > > Bottom line opinion: hardened_malloc's added security against exploit > by malware costs too much. I will not choose hardened_malloc for this > task. > ___ > devel mailing list -- devel@lists.fedoraproject.org > To unsubscribe send an email to devel-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org > Do not reply to spam, report it: > https://pagure.io/fedora-infrastructure/new_issue > ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
hardened malloc is big and slow
Here is one end-to-end performance measurement of using hardened_malloc. sudo sh -c "echo 1 >/proc/sys/vm/drop_caches" /usr/bin/time rpmbuild -bc kernel-5.15.11-100.fc34.spec >rpmbuild.out 2>&1 For glibc, the result was 19274.30user 2522.87system 1:49:06elapsed 332%CPU (0avgtext+0avgdata 3389052maxresident)k 148504inputs+217900040outputs (18221major+1005715216minor)pagefaults 0swaps For the same task, but preceded by export LD_PRELOAD=/usr/lib64/libhardened_malloc.so the result was 26108.73user 4805.55system 2:22:43elapsed 360%CPU (0avgtext+0avgdata 1881564maxresident)k 586704inputs+217900504outputs (31876major+1848825755minor)pagefaults 0swaps So compared to glibc-2.33-21.fc34.x86_64, hardened_malloc used 1.3 times as much wall clock (8563 / 6536 in seconds) 1.35 times as much user CPU (26108 / 19274) 1.9 times as much sys CPU ( 4805 / 2522). The environment was a physical machine running fedora 5.17.12-100.fc34.x86_64: Intel Core i5-6500 @3.2GHz (4 CPU, 4 cores, 256kB L2 cache per core, 6MB L3 shared) 32GB DDR4 RAM /usr ext4 on SSD, /data ext4 on 4TB spinning commodity hard drive In the .spec, I changed to: %define make_opts -j4 so that much of the compiling ran 4 jobs in parallel. /usr/bin/top showed minimal use of swapspace: 4MB, hardened_malloc required (as documented in its README.md): - /etc/sysctl.d/hardened_malloc.conf # (Fedora 5.17.12) default is 65530 (2**16 - 6), # libhardened_malloc suggests 1048576 (2**20) # we choose 1048570 (2**20 - 6) vm.max_map_count = 1048570 - else the job crashed: BTF .btf.vmlinux.bin.o memory exhausted The libhardened_malloc source code version was: commit 72fb3576f568481a03076c62df37984f96bfdfeb of Tue Aug 16 07:47:26 2022 -0400 Bottom line opinion: hardened_malloc's added security against exploit by malware costs too much. I will not choose hardened_malloc for this task. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue