[RFC v1 0/2] mm: SLUB Freelist randomization

2016-05-18 Thread Thomas Garnier
This is RFC v1 for the SLUB Freelist randomization.

***Background:
This proposal follows the previous SLAB Freelist patch submitted to next.
It resuses parts of previous implementation and keep a similar approach.

The kernel heap allocators are using a sequential freelist making their
allocation predictable. This predictability makes kernel heap overflow
easier to exploit. An attacker can careful prepare the kernel heap to
control the following chunk overflowed.

For example these attacks exploit the predictability of the heap:
 - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
 - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

***Problems that needed solving:
 - Randomize the Freelist used in the SLUB allocator.
 - Ensure good performance to encourage usage.
 - Get best entropy in early boot stage.

***Parts:
 - 01/02 Reorganize the SLAB Freelist randomization to share elements
   with the SLUB implementation.
 - 02/02 The SLUB Freelist randomization implementation. Similar approach
   than the SLAB but tailored to the singled freelist used in SLUB.

***Performance data (no major changes):

slab_test, before:

Single thread testing
=
1. Kmalloc: Repeatedly allocate then free test
1 times kmalloc(8) -> 67 cycles kfree -> 101 cycles
1 times kmalloc(16) -> 68 cycles kfree -> 109 cycles
1 times kmalloc(32) -> 76 cycles kfree -> 119 cycles
1 times kmalloc(64) -> 88 cycles kfree -> 114 cycles
1 times kmalloc(128) -> 100 cycles kfree -> 122 cycles
1 times kmalloc(256) -> 128 cycles kfree -> 149 cycles
1 times kmalloc(512) -> 108 cycles kfree -> 152 cycles
1 times kmalloc(1024) -> 112 cycles kfree -> 158 cycles
1 times kmalloc(2048) -> 161 cycles kfree -> 208 cycles
1 times kmalloc(4096) -> 231 cycles kfree -> 239 cycles
1 times kmalloc(8192) -> 341 cycles kfree -> 270 cycles
1 times kmalloc(16384) -> 481 cycles kfree -> 323 cycles
2. Kmalloc: alloc/free test
1 times kmalloc(8)/kfree -> 90 cycles
1 times kmalloc(16)/kfree -> 89 cycles
1 times kmalloc(32)/kfree -> 88 cycles
1 times kmalloc(64)/kfree -> 88 cycles
1 times kmalloc(128)/kfree -> 94 cycles
1 times kmalloc(256)/kfree -> 87 cycles
1 times kmalloc(512)/kfree -> 91 cycles
1 times kmalloc(1024)/kfree -> 90 cycles
1 times kmalloc(2048)/kfree -> 90 cycles
1 times kmalloc(4096)/kfree -> 90 cycles
1 times kmalloc(8192)/kfree -> 90 cycles
1 times kmalloc(16384)/kfree -> 642 cycles

After:

Single thread testing
=
1. Kmalloc: Repeatedly allocate then free test
1 times kmalloc(8) -> 60 cycles kfree -> 74 cycles
1 times kmalloc(16) -> 63 cycles kfree -> 78 cycles
1 times kmalloc(32) -> 72 cycles kfree -> 85 cycles
1 times kmalloc(64) -> 91 cycles kfree -> 99 cycles
1 times kmalloc(128) -> 112 cycles kfree -> 109 cycles
1 times kmalloc(256) -> 127 cycles kfree -> 120 cycles
1 times kmalloc(512) -> 125 cycles kfree -> 121 cycles
1 times kmalloc(1024) -> 128 cycles kfree -> 125 cycles
1 times kmalloc(2048) -> 167 cycles kfree -> 141 cycles
1 times kmalloc(4096) -> 249 cycles kfree -> 174 cycles
1 times kmalloc(8192) -> 377 cycles kfree -> 225 cycles
1 times kmalloc(16384) -> 459 cycles kfree -> 247 cycles
2. Kmalloc: alloc/free test
1 times kmalloc(8)/kfree -> 72 cycles
1 times kmalloc(16)/kfree -> 74 cycles
1 times kmalloc(32)/kfree -> 71 cycles
1 times kmalloc(64)/kfree -> 75 cycles
1 times kmalloc(128)/kfree -> 71 cycles
1 times kmalloc(256)/kfree -> 72 cycles
1 times kmalloc(512)/kfree -> 72 cycles
1 times kmalloc(1024)/kfree -> 73 cycles
1 times kmalloc(2048)/kfree -> 73 cycles
1 times kmalloc(4096)/kfree -> 72 cycles
1 times kmalloc(8192)/kfree -> 72 cycles
1 times kmalloc(16384)/kfree -> 546 cycles

Kernbench, before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 101.873 (1.16069)
User Time 1045.22 (1.60447)
System Time 88.969 (0.559195)
Percent CPU 1112.9 (13.8279)
Context Switches 189140 (2282.15)
Sleeps 99008.6 (768.091)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.47 (0.562732)
User Time 1045.3 (1.34263)
System Time 88.311 (0.342554)
Percent CPU 1105.8 (6.49444)
Context Switches 189081 (2355.78)
Sleeps 99231.5 (800.358)



[RFC v1 0/2] mm: SLUB Freelist randomization

2016-05-18 Thread Thomas Garnier
This is RFC v1 for the SLUB Freelist randomization.

***Background:
This proposal follows the previous SLAB Freelist patch submitted to next.
It resuses parts of previous implementation and keep a similar approach.

The kernel heap allocators are using a sequential freelist making their
allocation predictable. This predictability makes kernel heap overflow
easier to exploit. An attacker can careful prepare the kernel heap to
control the following chunk overflowed.

For example these attacks exploit the predictability of the heap:
 - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
 - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

***Problems that needed solving:
 - Randomize the Freelist used in the SLUB allocator.
 - Ensure good performance to encourage usage.
 - Get best entropy in early boot stage.

***Parts:
 - 01/02 Reorganize the SLAB Freelist randomization to share elements
   with the SLUB implementation.
 - 02/02 The SLUB Freelist randomization implementation. Similar approach
   than the SLAB but tailored to the singled freelist used in SLUB.

***Performance data (no major changes):

slab_test, before:

Single thread testing
=
1. Kmalloc: Repeatedly allocate then free test
1 times kmalloc(8) -> 67 cycles kfree -> 101 cycles
1 times kmalloc(16) -> 68 cycles kfree -> 109 cycles
1 times kmalloc(32) -> 76 cycles kfree -> 119 cycles
1 times kmalloc(64) -> 88 cycles kfree -> 114 cycles
1 times kmalloc(128) -> 100 cycles kfree -> 122 cycles
1 times kmalloc(256) -> 128 cycles kfree -> 149 cycles
1 times kmalloc(512) -> 108 cycles kfree -> 152 cycles
1 times kmalloc(1024) -> 112 cycles kfree -> 158 cycles
1 times kmalloc(2048) -> 161 cycles kfree -> 208 cycles
1 times kmalloc(4096) -> 231 cycles kfree -> 239 cycles
1 times kmalloc(8192) -> 341 cycles kfree -> 270 cycles
1 times kmalloc(16384) -> 481 cycles kfree -> 323 cycles
2. Kmalloc: alloc/free test
1 times kmalloc(8)/kfree -> 90 cycles
1 times kmalloc(16)/kfree -> 89 cycles
1 times kmalloc(32)/kfree -> 88 cycles
1 times kmalloc(64)/kfree -> 88 cycles
1 times kmalloc(128)/kfree -> 94 cycles
1 times kmalloc(256)/kfree -> 87 cycles
1 times kmalloc(512)/kfree -> 91 cycles
1 times kmalloc(1024)/kfree -> 90 cycles
1 times kmalloc(2048)/kfree -> 90 cycles
1 times kmalloc(4096)/kfree -> 90 cycles
1 times kmalloc(8192)/kfree -> 90 cycles
1 times kmalloc(16384)/kfree -> 642 cycles

After:

Single thread testing
=
1. Kmalloc: Repeatedly allocate then free test
1 times kmalloc(8) -> 60 cycles kfree -> 74 cycles
1 times kmalloc(16) -> 63 cycles kfree -> 78 cycles
1 times kmalloc(32) -> 72 cycles kfree -> 85 cycles
1 times kmalloc(64) -> 91 cycles kfree -> 99 cycles
1 times kmalloc(128) -> 112 cycles kfree -> 109 cycles
1 times kmalloc(256) -> 127 cycles kfree -> 120 cycles
1 times kmalloc(512) -> 125 cycles kfree -> 121 cycles
1 times kmalloc(1024) -> 128 cycles kfree -> 125 cycles
1 times kmalloc(2048) -> 167 cycles kfree -> 141 cycles
1 times kmalloc(4096) -> 249 cycles kfree -> 174 cycles
1 times kmalloc(8192) -> 377 cycles kfree -> 225 cycles
1 times kmalloc(16384) -> 459 cycles kfree -> 247 cycles
2. Kmalloc: alloc/free test
1 times kmalloc(8)/kfree -> 72 cycles
1 times kmalloc(16)/kfree -> 74 cycles
1 times kmalloc(32)/kfree -> 71 cycles
1 times kmalloc(64)/kfree -> 75 cycles
1 times kmalloc(128)/kfree -> 71 cycles
1 times kmalloc(256)/kfree -> 72 cycles
1 times kmalloc(512)/kfree -> 72 cycles
1 times kmalloc(1024)/kfree -> 73 cycles
1 times kmalloc(2048)/kfree -> 73 cycles
1 times kmalloc(4096)/kfree -> 72 cycles
1 times kmalloc(8192)/kfree -> 72 cycles
1 times kmalloc(16384)/kfree -> 546 cycles

Kernbench, before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 101.873 (1.16069)
User Time 1045.22 (1.60447)
System Time 88.969 (0.559195)
Percent CPU 1112.9 (13.8279)
Context Switches 189140 (2282.15)
Sleeps 99008.6 (768.091)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.47 (0.562732)
User Time 1045.3 (1.34263)
System Time 88.311 (0.342554)
Percent CPU 1105.8 (6.49444)
Context Switches 189081 (2355.78)
Sleeps 99231.5 (800.358)