Re: [PATCH] efi: pstore: Allow dynamic initialization based on module parameter
On Wed, 03 Jan 2024 15:40:32 -0300, Guilherme G. Piccoli wrote: > The efi-pstore module parameter "pstore_disable" warrants that users > are able to deactivate such backend. There is also a Kconfig option > for the default value of this parameter. It was originally added due > to some bad UEFI FW implementations that could break with many variables > written. > > Some distros (such as Arch Linux) set this in their config file still > nowadays. And once it is set, even being a writable module parameter, > there is effectively no way to make use of efi-pstore anymore. > If "pstore_disable" is set to true, the init function of the module exits > early and is never called again after the initcall processing. > > [...] Applied to for-next/pstore, thanks! [1/1] efi: pstore: Allow dynamic initialization based on module parameter https://git.kernel.org/kees/c/c3f849caf81b Take care, -- Kees Cook
Re: [RFC PATCH v2 1/9] pstore: move pstore creator id, section type and record struct to common header
On Mon, Sep 25, 2023 at 03:44:18PM +0800, Shuai Xue wrote: > Move pstore creator id, section type and record struct to the common > header, so that it can be use by MCE and GHES driver. I would prefer this was not in the pstore header -- this is a backend detail that should stay in backend headers. -Kees -- Kees Cook
Re: [RFC] random: UEFI RNG input is bootloader randomness
On Sat, Sep 28, 2019 at 12:14:28PM +0200, Dominik Brodowski wrote: > Depending on RANDOM_TRUST_BOOTLOADER, bootloader-provided randomness > is credited as entropy. As the UEFI seeding entropy pool is seeded by > the UEFI firmware/bootloader, add its content as bootloader randomness. > > Note that this UEFI (v2.4 or newer) feature is currently only > implemented for EFI stub booting on ARM, and further note that > RANDOM_TRUST_BOOTLOADER must only be enabled if there indeed is > sufficient trust in the bootloader _and_ its source of randomness. > > Signed-off-by: Dominik Brodowski > Cc: Ard Biesheuvel > Cc: Hsin-Yi Wang > Cc: Stephen Boyd > Cc: Rob Herring > Cc: Theodore Ts'o > Cc: Lee, Chun-Yi Reviewed-by: Kees Cook -Kees > > --- > > Untested patch, as efi_random_get_seed() is only hooked up on ARM, > and the firmware on my old x86 laptop only has UEFI v2.31 anyway. > > Thanks, > Dominik > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > index 8f1ab04f6743..db0bffce754e 100644 > --- a/drivers/firmware/efi/efi.c > +++ b/drivers/firmware/efi/efi.c > @@ -545,7 +545,7 @@ int __init efi_config_parse_tables(void *config_tables, > int count, int sz, > sizeof(*seed) + size); > if (seed != NULL) { > pr_notice("seeding entropy pool\n"); > - add_device_randomness(seed->bits, seed->size); > + add_bootloader_randomness(seed->bits, > seed->size); > early_memunmap(seed, sizeof(*seed) + size); > } else { > pr_err("Could not map UEFI random seed!\n"); -- Kees Cook
Re: [PATCH V34 28/29] efi: Restrict efivar_ssdt_load when the kernel is locked down
On Fri, Jun 21, 2019 at 05:03:57PM -0700, Matthew Garrett wrote: > efivar_ssdt_load allows the kernel to import arbitrary ACPI code from an > EFI variable, which gives arbitrary code execution in ring 0. Prevent > that when the kernel is locked down. > > Signed-off-by: Matthew Garrett Reviewed-by: Kees Cook -Kees > Cc: Ard Biesheuvel > Cc: linux-efi@vger.kernel.org > --- > drivers/firmware/efi/efi.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > index 55b77c576c42..9f92a013ab27 100644 > --- a/drivers/firmware/efi/efi.c > +++ b/drivers/firmware/efi/efi.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > > #include > > @@ -242,6 +243,11 @@ static void generic_ops_unregister(void) > static char efivar_ssdt[EFIVAR_SSDT_NAME_MAX] __initdata; > static int __init efivar_ssdt_setup(char *str) > { > + int ret = security_locked_down(LOCKDOWN_ACPI_TABLES); > + > + if (ret) > + return ret; > + > if (strlen(str) < sizeof(efivar_ssdt)) > memcpy(efivar_ssdt, str, strlen(str)); > else > -- > 2.22.0.410.gd8fdbe21b5-goog > -- Kees Cook
Re: [PATCH] pstore: Convert buf_lock to semaphore
On Tue, Dec 4, 2018 at 7:41 AM Sebastian Andrzej Siewior wrote: > > On 2018-11-30 14:47:36 [-0800], Kees Cook wrote: > > diff --git a/drivers/firmware/efi/efi-pstore.c > > b/drivers/firmware/efi/efi-pstore.c > > index cfe87b465819..0f7d97917197 100644 > > --- a/drivers/firmware/efi/efi-pstore.c > > +++ b/drivers/firmware/efi/efi-pstore.c > > @@ -259,8 +259,7 @@ static int efi_pstore_write(struct pstore_record > > *record) > > efi_name[i] = name[i]; > > > > ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES, > > - !pstore_cannot_block_path(record->reason), > > - record->size, record->psi->buf); > > + preemptible(), record->size, record->psi->buf); > > Well. Better I think. > might_sleep() / preempt_count_equals() checks for preemptible() + > rcu_preempt_depth(). > kmsg_dump() starts with rcu_read_lock() which means with this patch applied I > got: Okay, so, if kmsg_dump() uses rcu_read_lock(), that means efi-pstore can _never_ sleep, and it's nothing to do with pstore internals. :( I guess we just hard-code it, then? And efi-pstore should probably only attach to pstore if it has a nonblock implementation (and warn if one isn't available). -Kees -- Kees Cook
Re: [PATCH] pstore: Convert buf_lock to semaphore
On Tue, Dec 4, 2018 at 7:41 AM Sebastian Andrzej Siewior wrote: > > On 2018-11-30 14:47:36 [-0800], Kees Cook wrote: > > diff --git a/drivers/firmware/efi/efi-pstore.c > > b/drivers/firmware/efi/efi-pstore.c > > index cfe87b465819..0f7d97917197 100644 > > --- a/drivers/firmware/efi/efi-pstore.c > > +++ b/drivers/firmware/efi/efi-pstore.c > > @@ -259,8 +259,7 @@ static int efi_pstore_write(struct pstore_record > > *record) > > efi_name[i] = name[i]; > > > > ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES, > > - !pstore_cannot_block_path(record->reason), > > - record->size, record->psi->buf); > > + preemptible(), record->size, record->psi->buf); > > Well. Better I think. > might_sleep() / preempt_count_equals() checks for preemptible() + > rcu_preempt_depth(). > kmsg_dump() starts with rcu_read_lock() which means with this patch applied I > got: > > | BUG: sleeping function called from invalid context at > kernel/sched/completion.c:99 > | in_atomic(): 0, irqs_disabled(): 0, pid: 2286, name: sig-xstate-bum PC: 0 > RCU: 1 > | Preemption disabled at: > | [] __queue_work+0x95/0x440 > | CPU: 30 PID: 2286 Comm: sig-xstate-bum Tainted: G D > 4.20.0-rc3+ #90 > | Call Trace: > | dump_stack+0x4f/0x6a > | ___might_sleep.cold.91+0xef/0x100 > | __might_sleep+0x50/0x90 > | wait_for_completion+0x32/0x130 > | virt_efi_query_variable_info+0x14e/0x160 > | efi_query_variable_store+0x51/0x1a0 > | efivar_entry_set_safe+0xa3/0x1b0 > | efi_pstore_write+0x110/0x140 > | pstore_dump+0x114/0x320 > | kmsg_dump+0xa4/0xd0 > | oops_exit+0x7f/0x90 > | oops_end+0x67/0xd0 > | die+0x41/0x4a > | do_general_protection+0xc1/0x150 > | general_protection+0x1e/0x30 > | RIP: 0010:__fpu__restore_sig+0x1c1/0x540 > > just in case you wonder why both counter are zero and it still creates > this backtrace. Oh, hmm. That didn't show up in my testing. Any thoughts on a solution? > > > if (record->reason == KMSG_DUMP_OOPS) > > efivar_run_worker(); > > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c > > index 2387cb74f729..afdfd3687f94 100644 > > --- a/fs/pstore/platform.c > > +++ b/fs/pstore/platform.c > > @@ -400,23 +401,20 @@ static void pstore_dump(struct kmsg_dumper *dumper, > > unsigned long total = 0; > > const char *why; > > unsigned intpart = 1; > > - unsigned long flags = 0; > > - int is_locked; > > int ret; > > > > why = get_reason_str(reason); > > > > - if (pstore_cannot_block_path(reason)) { > > - is_locked = spin_trylock_irqsave(>buf_lock, flags); > > - if (!is_locked) { > > - pr_err("pstore dump routine blocked in %s path, may > > corrupt error record\n" > > -, in_nmi() ? "NMI" : why); > > + if (down_trylock(>buf_lock)) { > > + /* Failed to acquire lock: give up if we cannot wait. */ > > + if (pstore_cannot_wait(reason)) { > > + pr_err("dump skipped in %s path: may corrupt error > > record\n", > > + in_nmi() ? "NMI" : why); > > return; > > } > > - } else { > > - spin_lock_irqsave(>buf_lock, flags); > > - is_locked = 1; > > + down_interruptible(>buf_lock); > > In function ‘pstore_dump’: > fs/pstore/platform.c:393:3: warning: ignoring return value of > ‘down_interruptible’, declared with attribute warn_unused_result > [-Wunused-result] >down_interruptible(>buf_lock); >^ > Thanks, yes. I've fixed this in the version in -next. -Kees -- Kees Cook
Re: [PATCH] pstore: Convert buf_lock to semaphore
On Mon, Dec 3, 2018 at 8:26 AM Arnd Bergmann wrote: > > On Mon, Dec 3, 2018 at 12:18 PM Sebastian Andrzej Siewior > wrote: > > > > On 2018-12-01 09:42:38 [+0100], Arnd Bergmann wrote: > > > You are right that you can't take (or release) a mutex from interrupt > > > context. However, I don't think converting a spinlock to a semaphore > > > is going to help here either. > > > > you can acquire a semaphore with a try_lock from interrupts context but > > you can't do that with a mutex. You can also a acquire a semaphore in > > one context and release in another. > > Right, that is the obvious part. > > > I haven't looked a general picture yet, will try to do so later today or > > tomorrow. > > To speed this up, the problem I'm referring to is in > virt_efi_query_variable_info() and efi_queue_work(), > as in the original BUG_ON() that Kees quoted: > > > BUG: sleeping function called from invalid context at > > kernel/sched/completion.c:99 > > |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum > > |Preemption disabled at: > > |[] pstore_dump+0x72/0x330 > > |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D > > 4.20.0-rc3 #45 > > |Call Trace: > > | dump_stack+0x4f/0x6a > > | ___might_sleep.cold.91+0xd3/0xe4 > > | __might_sleep+0x50/0x90 > > | wait_for_completion+0x32/0x130 > > | virt_efi_query_variable_info+0x14e/0x160 > > | efi_query_variable_store+0x51/0x1a0 > > | efivar_entry_set_safe+0xa3/0x1b0 > > | efi_pstore_write+0x109/0x140 > > | pstore_dump+0x11c/0x330 > > | kmsg_dump+0xa4/0xd0 > > | oops_exit+0x22/0x30 > > This will no longer happen when pstore is called from process > context with his patch, but we still get the same thing if we call > pstore from interrupt context, unless both the down_interruptible > and wait_for_completion in there are also changed to > nonblocking calls. However, once they are no longer blocking, > we don't need the outer lock to be changed from spinlock > to semaphore any more either. My proposed patch was trying to do two things: - have pstore not make things _worse_ (do not hold a spin lock) - use preemptible() in efi pstore backend to get it right: ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES, - !pstore_cannot_block_path(record->reason), - record->size, record->psi->buf); + preemptible(), record->size, record->psi->buf); IIUC, that would make efi-vars take the nonblocking path so we don't trip over the might_sleep(). -Kees -- Kees Cook
Re: [PATCH] pstore: Convert buf_lock to semaphore
On Fri, Nov 30, 2018 at 2:52 PM Arnd Bergmann wrote: > > On Fri, Nov 30, 2018 at 11:48 PM Kees Cook wrote: > > > > Instead of running with interrupts disabled, use a semaphore. This should > > make it easier for backends that may need to sleep (e.g. EFI) when > > performing a write: > > > > |BUG: sleeping function called from invalid context at > > kernel/sched/completion.c:99 > > |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum > > |Preemption disabled at: > > |[] pstore_dump+0x72/0x330 > > |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D > > 4.20.0-rc3 #45 > > |Call Trace: > > | dump_stack+0x4f/0x6a > > | ___might_sleep.cold.91+0xd3/0xe4 > > | __might_sleep+0x50/0x90 > > | wait_for_completion+0x32/0x130 > > | virt_efi_query_variable_info+0x14e/0x160 > > | efi_query_variable_store+0x51/0x1a0 > > | efivar_entry_set_safe+0xa3/0x1b0 > > | efi_pstore_write+0x109/0x140 > > | pstore_dump+0x11c/0x330 > > | kmsg_dump+0xa4/0xd0 > > | oops_exit+0x22/0x30 > > ... > > > > Reported-by: Sebastian Andrzej Siewior > > Fixes: 21b3ddd39fee ("efi: Don't use spinlocks for efi vars") > > Signed-off-by: Kees Cook > > Hmm, I've actually been working on a patch set recently to deprecate > all semaphores from the kernel and replace them with something > else as much as possible. > > Why can't this be a mutex instead? My understanding is that I can't use a mutex in interrupt context (Documentation/kernel-hacking/locking.rst) and pstore_dump() needs to handle being called from anywhere. I'm surprised it's managed to get away with using a spinlock for this long. :P -Kees -- Kees Cook
[PATCH] pstore: Convert buf_lock to semaphore
Instead of running with interrupts disabled, use a semaphore. This should make it easier for backends that may need to sleep (e.g. EFI) when performing a write: |BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum |Preemption disabled at: |[] pstore_dump+0x72/0x330 |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D 4.20.0-rc3 #45 |Call Trace: | dump_stack+0x4f/0x6a | ___might_sleep.cold.91+0xd3/0xe4 | __might_sleep+0x50/0x90 | wait_for_completion+0x32/0x130 | virt_efi_query_variable_info+0x14e/0x160 | efi_query_variable_store+0x51/0x1a0 | efivar_entry_set_safe+0xa3/0x1b0 | efi_pstore_write+0x109/0x140 | pstore_dump+0x11c/0x330 | kmsg_dump+0xa4/0xd0 | oops_exit+0x22/0x30 ... Reported-by: Sebastian Andrzej Siewior Fixes: 21b3ddd39fee ("efi: Don't use spinlocks for efi vars") Signed-off-by: Kees Cook --- arch/powerpc/kernel/nvram_64.c| 2 -- drivers/acpi/apei/erst.c | 1 - drivers/firmware/efi/efi-pstore.c | 4 +-- fs/pstore/platform.c | 41 +++ fs/pstore/ram.c | 1 - include/linux/pstore.h| 7 +++--- 6 files changed, 24 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c index 22e9d281324d..e7d4ce6964ae 100644 --- a/arch/powerpc/kernel/nvram_64.c +++ b/arch/powerpc/kernel/nvram_64.c @@ -563,8 +563,6 @@ static int nvram_pstore_init(void) nvram_pstore_info.buf = oops_data; nvram_pstore_info.bufsize = oops_data_sz; - spin_lock_init(_pstore_info.buf_lock); - rc = pstore_register(_pstore_info); if (rc && (rc != -EPERM)) /* Print error only when pstore.backend == nvram */ diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c index a5e1d963208e..9953e50667ec 100644 --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -1176,7 +1176,6 @@ static int __init erst_init(void) "Error Record Serialization Table (ERST) support is initialized.\n"); buf = kmalloc(erst_erange.size, GFP_KERNEL); - spin_lock_init(_info.buf_lock); if (buf) { erst_info.buf = buf + sizeof(struct cper_pstore_record); erst_info.bufsize = erst_erange.size - diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index cfe87b465819..0f7d97917197 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -259,8 +259,7 @@ static int efi_pstore_write(struct pstore_record *record) efi_name[i] = name[i]; ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES, - !pstore_cannot_block_path(record->reason), - record->size, record->psi->buf); + preemptible(), record->size, record->psi->buf); if (record->reason == KMSG_DUMP_OOPS) efivar_run_worker(); @@ -369,7 +368,6 @@ static __init int efivars_pstore_init(void) return -ENOMEM; efi_pstore_info.bufsize = 1024; - spin_lock_init(_pstore_info.buf_lock); if (pstore_register(_pstore_info)) { kfree(efi_pstore_info.buf); diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 2387cb74f729..afdfd3687f94 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -161,26 +161,27 @@ static const char *get_reason_str(enum kmsg_dump_reason reason) } } -bool pstore_cannot_block_path(enum kmsg_dump_reason reason) +/* + * Should pstore_dump() wait for a concurrent pstore_dump()? If + * not, the current pstore_dump() will report a failure to dump + * and return. + */ +static bool pstore_cannot_wait(enum kmsg_dump_reason reason) { - /* -* In case of NMI path, pstore shouldn't be blocked -* regardless of reason. -*/ + /* In NMI path, pstore shouldn't block regardless of reason. */ if (in_nmi()) return true; switch (reason) { /* In panic case, other cpus are stopped by smp_send_stop(). */ case KMSG_DUMP_PANIC: - /* Emergency restart shouldn't be blocked by spin lock. */ + /* Emergency restart shouldn't be blocked. */ case KMSG_DUMP_EMERG: return true; default: return false; } } -EXPORT_SYMBOL_GPL(pstore_cannot_block_path); #if IS_ENABLED(CONFIG_PSTORE_DEFLATE_COMPRESS) static int zbufsize_deflate(size_t size) @@ -400,23 +401,20 @@ static void pstore_dump(struct kmsg_dumper *dumper, unsigned long total = 0; const char *why; unsigned intpart = 1; - unsigned long flags = 0; - int is_locked; int ret; why = get_reason_str(re
Re: EFI-pstore, sleeping from back context after fault
On Fri, Nov 30, 2018 at 1:20 PM Kees Cook wrote: > > On Fri, Nov 30, 2018 at 3:43 AM Sebastian Andrzej Siewior > wrote: > > > > On 2018-11-29 13:43:58 [-0800], Kees Cook wrote: > > > On Mon, Nov 26, 2018 at 9:04 AM Sebastian Andrzej Siewior > > > wrote: > > > This bug got handled by Jann Horn, yes? (I remember seeing a related > > > thread go by...) > > > > Correct, fix sits in the tip tree. Nevertheless it was useful to spot > > the other thing. > > > > > > This looks like it comes from commit 21b3ddd39fee ("efi: Don't use > > > > spinlocks for efi vars") because it replaced a spin_lock() with > > > > down_interruptible() in this case. In this case, since pstore_dump() > > > > holds psinfo->buf_lock with disabled interrupts we can't block… > > > > > > > > I have this as a workaround: > > > > > > I'm not sure this is correct... > > > > > > > diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c > > > > index 9336ffdf6e2c..d4dcfe46eb2e 100644 > > > > --- a/drivers/firmware/efi/vars.c > > > > +++ b/drivers/firmware/efi/vars.c > > > > @@ -755,7 +755,9 @@ int efivar_entry_set_safe(efi_char16_t *name, > > > > efi_guid_t vendor, u32 attributes, > > > > return -EINTR; > > > > } > > > > > > > > - status = check_var_size(attributes, size + ucs2_strsize(name, > > > > 1024)); > > > > + status = ops->query_variable_store(attributes, > > > > + size + ucs2_strsize(name, > > > > 1024), > > > > + block); > > > > > > Why does this change help? > > > > check_var_size() uses false as the argument for `block'. > > check_var_size_nonblocking uses true as the argument for `block'. > > Doing this ops->… allows to pass the `block' argument as received from > > caller. And the functions above already use the `block' argument. Plus > > the next hunk makes sure that `block' is set properly. > > Here's the code I see: > > if (!block && ops->set_variable_nonblocking) > return efivar_entry_set_nonblocking(name, vendor, attributes, > size, data); > > if (!block) { > if (down_trylock(_lock)) > return -EBUSY; > } else { > if (down_interruptible(_lock)) > return -EINTR; > } > > status = check_var_size(attributes, size + ucs2_strsize(name, 1024)); > if (status != EFI_SUCCESS) { > up(_lock); > return -ENOSPC; > } > > status = ops->set_variable(name, , attributes, size, data); > > up(_lock); > > In the !block case, this will already take the > efivar_entry_set_nonblocking() path. Isn't that sufficient? (i.e. I'm > not sure why I see a change needed in the EFI code, just the > pstore_cannot_block_path() path? > > > > > > > if (status != EFI_SUCCESS) { > > > > up(_lock); > > > > return -ENOSPC; > > > > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c > > > > index b821054ca3ed..9aa27a2c8d36 100644 > > > > --- a/fs/pstore/platform.c > > > > +++ b/fs/pstore/platform.c > > > > @@ -127,10 +127,10 @@ static const char *get_reason_str(enum > > > > kmsg_dump_reason reason) > > > > bool pstore_cannot_block_path(enum kmsg_dump_reason reason) > > > > { > > > > /* > > > > -* In case of NMI path, pstore shouldn't be blocked > > > > -* regardless of reason. > > > > +* In case of disabled preemption / interrupts we can't block > > > > on any > > > > +* lock regardless of reason. > > > > */ > > > > - if (in_nmi()) > > > > + if (in_atomic() || irqs_disabled()) > > > > return true; > > > > > > > > switch (reason) { > > > > > > I'd like to avoid any case where pstore might "miss" a crash report, > > > though... this seems to make it easier for it to fail to record a > > > crash, yes? > > > > Well, if the lock is occupied, yes. But waiting for the completion with
Re: EFI-pstore, sleeping from back context after fault
On Fri, Nov 30, 2018 at 3:43 AM Sebastian Andrzej Siewior wrote: > > On 2018-11-29 13:43:58 [-0800], Kees Cook wrote: > > On Mon, Nov 26, 2018 at 9:04 AM Sebastian Andrzej Siewior > > wrote: > > This bug got handled by Jann Horn, yes? (I remember seeing a related > > thread go by...) > > Correct, fix sits in the tip tree. Nevertheless it was useful to spot > the other thing. > > > > This looks like it comes from commit 21b3ddd39fee ("efi: Don't use > > > spinlocks for efi vars") because it replaced a spin_lock() with > > > down_interruptible() in this case. In this case, since pstore_dump() > > > holds psinfo->buf_lock with disabled interrupts we can't block… > > > > > > I have this as a workaround: > > > > I'm not sure this is correct... > > > > > diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c > > > index 9336ffdf6e2c..d4dcfe46eb2e 100644 > > > --- a/drivers/firmware/efi/vars.c > > > +++ b/drivers/firmware/efi/vars.c > > > @@ -755,7 +755,9 @@ int efivar_entry_set_safe(efi_char16_t *name, > > > efi_guid_t vendor, u32 attributes, > > > return -EINTR; > > > } > > > > > > - status = check_var_size(attributes, size + ucs2_strsize(name, > > > 1024)); > > > + status = ops->query_variable_store(attributes, > > > + size + ucs2_strsize(name, > > > 1024), > > > + block); > > > > Why does this change help? > > check_var_size() uses false as the argument for `block'. > check_var_size_nonblocking uses true as the argument for `block'. > Doing this ops->… allows to pass the `block' argument as received from > caller. And the functions above already use the `block' argument. Plus > the next hunk makes sure that `block' is set properly. Here's the code I see: if (!block && ops->set_variable_nonblocking) return efivar_entry_set_nonblocking(name, vendor, attributes, size, data); if (!block) { if (down_trylock(_lock)) return -EBUSY; } else { if (down_interruptible(_lock)) return -EINTR; } status = check_var_size(attributes, size + ucs2_strsize(name, 1024)); if (status != EFI_SUCCESS) { up(_lock); return -ENOSPC; } status = ops->set_variable(name, , attributes, size, data); up(_lock); In the !block case, this will already take the efivar_entry_set_nonblocking() path. Isn't that sufficient? (i.e. I'm not sure why I see a change needed in the EFI code, just the pstore_cannot_block_path() path? > > > > if (status != EFI_SUCCESS) { > > > up(_lock); > > > return -ENOSPC; > > > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c > > > index b821054ca3ed..9aa27a2c8d36 100644 > > > --- a/fs/pstore/platform.c > > > +++ b/fs/pstore/platform.c > > > @@ -127,10 +127,10 @@ static const char *get_reason_str(enum > > > kmsg_dump_reason reason) > > > bool pstore_cannot_block_path(enum kmsg_dump_reason reason) > > > { > > > /* > > > -* In case of NMI path, pstore shouldn't be blocked > > > -* regardless of reason. > > > +* In case of disabled preemption / interrupts we can't block on > > > any > > > +* lock regardless of reason. > > > */ > > > - if (in_nmi()) > > > + if (in_atomic() || irqs_disabled()) > > > return true; > > > > > > switch (reason) { > > > > I'd like to avoid any case where pstore might "miss" a crash report, > > though... this seems to make it easier for it to fail to record a > > crash, yes? > > Well, if the lock is occupied, yes. But waiting for the completion with > disabled interrupts isn't much better :) > > pstore_cannot_block_path() is used in two places. One caller is taking a > spin_lock(). So that one could keep using the "old" function. > > The other caller uses it before acquiring the semaphore. The non-try > version can't be used with preemption or interrupts disabled. > > You could split those two cases and let the sempaphire user use the > "(in_atomic() || irqs_disabled())" from above. preempt.h:#define in_nmi() (preempt_count() & NMI_MASK) preempt.h:#define in_atomic() (preempt_count() != 0) NMI is a special case of preempt_count() test. in_atomic() is a non-zero preempt_count() test. It looks like we want to use preemptible()? #define preemptible() (preempt_count() == 0 && !irqs_disabled()) So, perhaps: - if (in_nmi()) + if (!preemptible()) What do you think? -- Kees Cook
Re: EFI-pstore, sleeping from back context after fault
On Mon, Nov 26, 2018 at 9:04 AM Sebastian Andrzej Siewior wrote: > > So I triggered a bug in x86 code. First the "okay" backtrace: > |BUG: GPF in non-whitelisted uaccess (non-canonical address?) > |general protection fault: [#1] PREEMPT SMP NOPTI > |CPU: 26 PID: 2236 Comm: sig-xstate-bum Not tainted 4.20.0-rc3 #45 > |RIP: 0010:__fpu__restore_sig+0x1c1/0x540 > |Call Trace: > | fpu__restore_sig+0x28/0x40 > | restore_sigcontext+0x13a/0x180 > | __ia32_sys_rt_sigreturn+0xae/0x100 > | do_syscall_64+0x4f/0x100 > | entry_SYSCALL_64_after_hwframe+0x44/0xa9 > |RIP: 0033:0x7f9b06aea227 > |---[ end trace a45ac23b593e9ab0 ]--- This bug got handled by Jann Horn, yes? (I remember seeing a related thread go by...) > > and now the not okay part: > > |BUG: sleeping function called from invalid context at > kernel/sched/completion.c:99 > |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum > |Preemption disabled at: > |[] pstore_dump+0x72/0x330 > |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D > 4.20.0-rc3 #45 > |Call Trace: > | dump_stack+0x4f/0x6a > | ___might_sleep.cold.91+0xd3/0xe4 > | __might_sleep+0x50/0x90 > | wait_for_completion+0x32/0x130 > | virt_efi_query_variable_info+0x14e/0x160 > | efi_query_variable_store+0x51/0x1a0 > | efivar_entry_set_safe+0xa3/0x1b0 > | efi_pstore_write+0x109/0x140 Eww. :) > | pstore_dump+0x11c/0x330 > | kmsg_dump+0xa4/0xd0 > | oops_exit+0x22/0x30 > | oops_end+0x67/0xd0 > | die+0x41/0x4a > | do_general_protection+0xc1/0x150 > | general_protection+0x1e/0x30 > |RIP: 0010:__fpu__restore_sig+0x1c1/0x540 > | fpu__restore_sig+0x28/0x40 > | restore_sigcontext+0x13a/0x180 > | __ia32_sys_rt_sigreturn+0xae/0x100 > | do_syscall_64+0x4f/0x100 > | entry_SYSCALL_64_after_hwframe+0x44/0xa9 > |RIP: 0033:0x7f9b06aea227 > … [ moar backtrace ] > > This looks like it comes from commit 21b3ddd39fee ("efi: Don't use > spinlocks for efi vars") because it replaced a spin_lock() with > down_interruptible() in this case. In this case, since pstore_dump() > holds psinfo->buf_lock with disabled interrupts we can't block… > > I have this as a workaround: I'm not sure this is correct... > diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c > index 9336ffdf6e2c..d4dcfe46eb2e 100644 > --- a/drivers/firmware/efi/vars.c > +++ b/drivers/firmware/efi/vars.c > @@ -755,7 +755,9 @@ int efivar_entry_set_safe(efi_char16_t *name, efi_guid_t > vendor, u32 attributes, > return -EINTR; > } > > - status = check_var_size(attributes, size + ucs2_strsize(name, 1024)); > + status = ops->query_variable_store(attributes, > + size + ucs2_strsize(name, 1024), > + block); Why does this change help? > if (status != EFI_SUCCESS) { > up(_lock); > return -ENOSPC; > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c > index b821054ca3ed..9aa27a2c8d36 100644 > --- a/fs/pstore/platform.c > +++ b/fs/pstore/platform.c > @@ -127,10 +127,10 @@ static const char *get_reason_str(enum kmsg_dump_reason > reason) > bool pstore_cannot_block_path(enum kmsg_dump_reason reason) > { > /* > -* In case of NMI path, pstore shouldn't be blocked > -* regardless of reason. > +* In case of disabled preemption / interrupts we can't block on any > +* lock regardless of reason. > */ > - if (in_nmi()) > + if (in_atomic() || irqs_disabled()) > return true; > > switch (reason) { I'd like to avoid any case where pstore might "miss" a crash report, though... this seems to make it easier for it to fail to record a crash, yes? I have some pending clean-ups in this area (buf_lock may not be needed at all, actually), so I'll try to work through those too. -Kees -- Kees Cook
Re: [PATCH v7 3/3] x86/boot/KASLR: Limit kaslr to choosing the immovable memory
On Thu, Sep 13, 2018 at 3:46 AM, Chao Fan wrote: > If CONFIG_MEMORY_HOTREMOVE enabled and the amount of immovable > memory regions is not zero. Calculate the intersection between memory > regions from e820/efi memory table and immovable memory regions. > > Signed-off-by: Chao Fan > --- > arch/x86/boot/compressed/kaslr.c | 71 +++- > 1 file changed, 60 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/boot/compressed/kaslr.c > b/arch/x86/boot/compressed/kaslr.c > index 0c3567bc231c..0a7ef2daf169 100644 > --- a/arch/x86/boot/compressed/kaslr.c > +++ b/arch/x86/boot/compressed/kaslr.c > @@ -101,6 +101,11 @@ static bool memmap_too_large; > /* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */ > static unsigned long long mem_limit = ULLONG_MAX; > > +#ifdef CONFIG_MEMORY_HOTREMOVE > +/* Store the immovable memory regions */ > +extern struct mem_vector immovable_mem[MAX_NUMNODES*2]; > +#endif > + > > enum mem_avoid_index { > MEM_AVOID_ZO_RANGE = 0, > @@ -577,9 +582,9 @@ static unsigned long slots_fetch_random(void) > return 0; > } > > -static void process_mem_region(struct mem_vector *entry, > - unsigned long minimum, > - unsigned long image_size) > +static void slots_count(struct mem_vector *entry, > + unsigned long minimum, > + unsigned long image_size) > { > struct mem_vector region, overlap; > unsigned long start_orig, end; > @@ -655,6 +660,56 @@ static void process_mem_region(struct mem_vector *entry, > } > } > > +static bool process_mem_region(struct mem_vector *region, > + unsigned long long minimum, > + unsigned long long image_size) > +{ > + int i; > + /* > +* If no immovable memory found, or MEMORY_HOTREMOVE disabled, > +* walk all the regions, so use region directely. > +*/ > + if (num_immovable_mem == 0) { > + slots_count(region, minimum, image_size); > + > + if (slot_area_index == MAX_SLOT_AREA) { > + debug_putstr("Aborted e820/efi memmap scan > (slot_areas full)!\n"); > + return 1; > + } > + return 0; > + } > + > +#ifdef CONFIG_MEMORY_HOTREMOVE > + /* > +* If immovable memory found, filter the intersection between > +* immovable memory and region to slots_count. > +* Otherwise, go on old code. > +*/ > + for (i = 0; i < num_immovable_mem; i++) { > + struct mem_vector entry; > + unsigned long long start, end, entry_end, region_end; > + > + start = immovable_mem[i].start; > + end = start + immovable_mem[i].size; > + region_end = region->start + region->size; > + > + entry.start = clamp(region->start, start, end); > + entry_end = clamp(region_end, start, end); > + > + if (entry.start + image_size < entry_end) { Can this logic be rewritten to use the existing mem_overlaps() check instead? I think that would make it much more readable. Otherwise, yes, this all looks fine. -Kees > + entry.size = entry_end - entry.start; > + slots_count(, minimum, image_size); > + > + if (slot_area_index == MAX_SLOT_AREA) { > + debug_putstr("Aborted e820/efi memmap scan > (slot_areas full)!\n"); > + return 1; > + } > + } > + } > + return 0; > +#endif > +} > + > #ifdef CONFIG_EFI > /* > * Returns true if mirror region found (and must have been processed > @@ -720,11 +775,8 @@ process_efi_entries(unsigned long minimum, unsigned long > image_size) > > region.start = md->phys_addr; > region.size = md->num_pages << EFI_PAGE_SHIFT; > - process_mem_region(, minimum, image_size); > - if (slot_area_index == MAX_SLOT_AREA) { > - debug_putstr("Aborted EFI scan (slot_areas full)!\n"); > + if (process_mem_region(, minimum, image_size)) > break; > - } > } > return true; > } > @@ -751,11 +803,8 @@ static void process_e820_entries(unsigned long minimum, > continue; > region.start = entry->add
Re: [PATCH v7 2/3] x86/boot/KASLR: Walk srat tables to filter immovable memory
On Thu, Sep 13, 2018 at 3:46 AM, Chao Fan wrote: > If CONFIG_MEMORY_HOTREMOVE enabled, walk through the acpi srat memory > tables and store those immovable memory regions so that kaslr can get > where to choose for randomization. > > Signed-off-by: Chao Fan > --- > arch/x86/boot/compressed/kaslr.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/boot/compressed/kaslr.c > b/arch/x86/boot/compressed/kaslr.c > index 9ed9709d9947..0c3567bc231c 100644 > --- a/arch/x86/boot/compressed/kaslr.c > +++ b/arch/x86/boot/compressed/kaslr.c > @@ -417,6 +417,11 @@ static void mem_avoid_init(unsigned long input, unsigned > long input_size, > /* Mark the memmap regions we need to avoid */ > handle_mem_options(); > > +#ifdef CONFIG_MEMORY_HOTREMOVE > + /* Mark the immovable regions we need to choose */ > + get_immovable_mem(); > +#endif Style nit-pick: I'd prefer the definition of get_immovable_mem() do the ifdef so it doesn't appear here. -Kees > + > #ifdef CONFIG_X86_VERBOSE_BOOTUP > /* Make sure video RAM can be used. */ > add_identity_map(0, PMD_SIZE); > -- > 2.17.1 > > > -- Kees Cook Pixel Security
Re: [PATCH v7 1/3] x86/boot: Add acpitb.c to parse acpi tables
On Thu, Sep 13, 2018 at 3:46 AM, Chao Fan wrote: > There is a bug that kaslr may randomly chooses some positions > which are located in movable memory regions. This will break memory > hotplug feature and make the movable memory chosen by KASLR can't be > removed. So dig SRAT table from ACPI tables to get memory information. > > Imitate the ACPI code of parsing ACPI tables to dig and read ACPI > tables. Since some operations are not needed here, functions are > simplified. Functions will be used to dig only SRAT tables to get > information of memory, so that KASLR can the memory in immovable node. > > And also, these functions won't influence the initialization of > ACPI after start_kernel(). > > Since use physical address directely, so acpi_os_map_memory() > and acpi_os_unmap_memory() are not needed. > > Signed-off-by: Chao Fan > --- > arch/x86/boot/compressed/Makefile | 4 + > arch/x86/boot/compressed/acpitb.c | 401 ++ Does this logic live anywhere else in the kernel already? (i.e. could other code be reused?) -Kees -- Kees Cook Pixel Security
Re: [GIT PULL] Kernel lockdown for secure boot
On Tue, Apr 3, 2018 at 12:01 PM, Andy Lutomirski <l...@kernel.org> wrote: > On Tue, Apr 3, 2018 at 11:45 AM, Kees Cook <keesc...@chromium.org> wrote: >> On Tue, Apr 3, 2018 at 9:45 AM, Andy Lutomirski <l...@kernel.org> wrote: >>> On Tue, Apr 3, 2018 at 9:29 AM, Matthew Garrett <mj...@google.com> wrote: >>>> On Tue, Apr 3, 2018 at 8:11 AM Andy Lutomirski <l...@kernel.org> wrote: >>>>> Can you explain that much more clearly? I'm asking why booting via >>>>> UEFI Secure Boot should enable lockdown, and I don't see what this has >>>>> to do with kexec. And "someone blacklist[ing] your key in the >>>>> bootloader" sounds like a political issue, not a technical issue. >>>> >>>> A kernel that allows users arbitrary access to ring 0 is just an >>>> overfeatured bootloader. Why would you want secure boot in that case? >>> >>> To get a chain of trust. I can provision a system with some public >>> keys, stored in UEFI authenticated variables, such that the system >>> will only boot a signed image. That signed image, can, in turn, load >>> a signed (or hashed or otherwise verfified) kernel and a verified >>> initramfs. The initramfs can run a full system from a verified (using >>> dm-verity or similar) filesystem, for example. Now it's very hard to >>> persistently attack this system. Chromium OS does something very much >>> like this, except that it doesn't use UEFI as far as I know. So does >>> iOS, and so do some Android versions. >> >> Correct, Chrome OS does not use UEFI, and we still want this patch >> series, as it plugs all the known "intentional" escalation paths from >> uid-0 to ring-0. Happily, that means all the politics around the UEFI >> and Secure Boot case can be ignored, because those issues are specific >> to Secure Boot, not the lockdown series. (They are _related_, sure, >> but lockdown isn't only about Secure Boot -- it's just that SB is one >> of the widely deployed implementations of this kind of >> trust-chain-booting-thing. Chrome OS and Android's Verified Boot do >> similar things and have the same expectations about the uid-0/ring-0 >> separation.) >> >> The goal for that bright line on Chrome OS and Android is to stop >> attack persistence. We want to know that a reboot onto a new kernel >> and OS image will actually result in getting the desired system state, >> and that any attack on persistent system data (even for things running >> with full root privileges) can't result in using kernel interfaces to >> gain kernel control. This isn't expected to be _perfect_, since >> nothing is. But it creates a place to work from. The idea that uid-0 >> is NOT ring-0 is still relatively new, so the existing designs in the >> kernel aren't well suited to building that distinction. I view this >> series as a solid first step to getting there, though. > > But wouldn't Chrome OS possibly want to lock down kernel memory write > vectors but not read vectors? After all, debugging is useful even on > Chrome OS. Chrome OS absolutely wants to block writing. We also want to block reading as much as we possibly can, though yes we bump up against debugging in that quest. But those cases are manageable and specific, IMO. -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Kernel lockdown for secure boot
On Tue, Apr 3, 2018 at 9:45 AM, Andy Lutomirski <l...@kernel.org> wrote: > On Tue, Apr 3, 2018 at 9:29 AM, Matthew Garrett <mj...@google.com> wrote: >> On Tue, Apr 3, 2018 at 8:11 AM Andy Lutomirski <l...@kernel.org> wrote: >>> Can you explain that much more clearly? I'm asking why booting via >>> UEFI Secure Boot should enable lockdown, and I don't see what this has >>> to do with kexec. And "someone blacklist[ing] your key in the >>> bootloader" sounds like a political issue, not a technical issue. >> >> A kernel that allows users arbitrary access to ring 0 is just an >> overfeatured bootloader. Why would you want secure boot in that case? > > To get a chain of trust. I can provision a system with some public > keys, stored in UEFI authenticated variables, such that the system > will only boot a signed image. That signed image, can, in turn, load > a signed (or hashed or otherwise verfified) kernel and a verified > initramfs. The initramfs can run a full system from a verified (using > dm-verity or similar) filesystem, for example. Now it's very hard to > persistently attack this system. Chromium OS does something very much > like this, except that it doesn't use UEFI as far as I know. So does > iOS, and so do some Android versions. Correct, Chrome OS does not use UEFI, and we still want this patch series, as it plugs all the known "intentional" escalation paths from uid-0 to ring-0. Happily, that means all the politics around the UEFI and Secure Boot case can be ignored, because those issues are specific to Secure Boot, not the lockdown series. (They are _related_, sure, but lockdown isn't only about Secure Boot -- it's just that SB is one of the widely deployed implementations of this kind of trust-chain-booting-thing. Chrome OS and Android's Verified Boot do similar things and have the same expectations about the uid-0/ring-0 separation.) The goal for that bright line on Chrome OS and Android is to stop attack persistence. We want to know that a reboot onto a new kernel and OS image will actually result in getting the desired system state, and that any attack on persistent system data (even for things running with full root privileges) can't result in using kernel interfaces to gain kernel control. This isn't expected to be _perfect_, since nothing is. But it creates a place to work from. The idea that uid-0 is NOT ring-0 is still relatively new, so the existing designs in the kernel aren't well suited to building that distinction. I view this series as a solid first step to getting there, though. -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] hash addresses printed with %p
On Fri, Dec 1, 2017 at 7:34 AM, Greg Kroah-Hartman <gre...@linuxfoundation.org> wrote: > And isn't there a specific %p modifier you should use for a kernel > pointer. I've lost the thread here for what should, or should not, be > done for kernel pointers these days based on the long email discussion. Current implementation to bypass the hashing is %px. (Though perhaps all %px usage should include a comment with a justification?) -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback
On Mon, Jul 24, 2017 at 12:07 PM, Brijesh Singh <brijesh.si...@amd.com> wrote: > From: Tom Lendacky <thomas.lenda...@amd.com> > > In prep for a new function that will need additional resource information > during the resource walk, update the resource walk callback to pass the > resource structure. Since the current callback start and end arguments > are pulled from the resource structure, the callback functions can obtain > them from the resource structure directly. > > Signed-off-by: Tom Lendacky <thomas.lenda...@amd.com> > Signed-off-by: Brijesh Singh <brijesh.si...@amd.com> This is a nice clean up even without the refactoring need. :) Reviewed-by: Kees Cook <keesc...@chromium.org> Thanks! -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/2] Add EFI capsule pstore backend support
On Thu, Jun 22, 2017 at 9:34 AM, Qiuxu Zhuo <qiuxu.z...@intel.com> wrote: > Change Log v3->v4: > - Add comment 'the number of config tables' for 'nr_config_table' in efi > structure > - Initialize 'efi.nr_config_table' to 0 in default > - Set 'efi.nr_config_table' to 'efi.systab->nr_tables' in > drivers/firmware/efi/arm-init.c -> uefi_init() > - Mark 'efi_capsule_pstore_disable' as __ro_after_init > - Use timestamp value passed from pstore API rather than using get_seconds() > - Pass page physcial addresses rather than struct page pointers to > efi_capsule_update() Thanks for the updates! Reviewed-by: Kees Cook <keesc...@chromium.org> > > Qiuxu Zhuo (2): > efi: Add 'nr_config_table' variable in efi structure > eif/capsule-pstore: Add capsule pstore backend > > arch/x86/platform/efi/efi.c | 1 + > drivers/firmware/efi/Kconfig | 21 ++ > drivers/firmware/efi/Makefile | 1 + > drivers/firmware/efi/arm-init.c | 4 +- > drivers/firmware/efi/capsule-pstore.c | 679 > ++ > drivers/firmware/efi/efi.c| 1 + > include/linux/efi.h | 1 + > 7 files changed, 707 insertions(+), 1 deletion(-) > create mode 100644 drivers/firmware/efi/capsule-pstore.c > > -- > 2.9.0.GIT > -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] eif/capsule-pstore: Add capsule pstore backend
if (atomic_long_cmpxchg(ring_offset, curr, next)) { > + curr = 0; > + break; > + } > + continue; > + } > + > + } while (atomic_long_cmpxchg(ring_offset, curr, next) != curr); > + > + return ring_buf + curr; > +} > + > +static int notrace efi_capsule_pstore_write(struct pstore_record *record) > +{ > + struct efi_capsule_pstore *pctx = record->psi->data; > + struct efi_capsule_pstore_record *rec; > + static atomic64_t seq; > + void *dst; > + > + /* > +* A zero size record would break our detection of > +* partially-filled capsules. > +*/ > + if (!record->size) > + return -EINVAL; > + > + if (record->type == PSTORE_TYPE_CONSOLE) { > + dst = get_pstore_write_ring_buf(>console, record->size); > + } else if (record->type == PSTORE_TYPE_FTRACE) { > + dst = get_pstore_write_ring_buf(>ftrace, record->size); > + } else if (record->type == PSTORE_TYPE_DMESG) { > + size_t size = record->size; > + > + rec = get_pstore_write_record(>dmesg, >size); > + if (!rec || (record->compressed && record->size < size)) > + return -ENOSPC; > + > + rec->type = record->type; > + rec->timestamp = get_seconds(); timestamp will already be correctly populated by the pstore API, so it should be copied from there rather than using get_seconds() which may not be functional during resume, etc. > + rec->size = record->size; > + if (!atomic64_read()) > + atomic64_set(, rec->timestamp << 32); > + record->id = atomic64_inc_return(); > + rec->id = record->id; > + rec->compressed = record->compressed; > + rec->magic = CAPSULE_MAGIC; > + > + dst = rec->data; > + } else if (record->type == PSTORE_TYPE_PMSG) { > + pr_warn_ratelimited("PMSG should not call %s\n", __func__); > + return -EINVAL; > + } else { > + return -EINVAL; > + } > + > + if (!dst) > + return -ENOSPC; > + > + memcpy(dst, record->buf, record->size); > + > + return 0; > +} > + > +static int notrace efi_capsule_pstore_write_user(struct pstore_record > *record, const char __user *buf) > +{ > + struct efi_capsule_pstore *pctx = record->psi->data; > + void *dst; > + > + if (record->type == PSTORE_TYPE_PMSG) { > + dst = get_pstore_write_ring_buf(>pmsg, record->size); > + if (!dst) > + return -ENOSPC; > + > + if (unlikely(__copy_from_user(dst, buf, record->size))) > + return -EFAULT; > + > + return 0; > + } > + > + return -EINVAL; > +} > + > +static __init int check_capsule_support(void) > +{ > + int rv; > + > + efi_guid_t guid = LINUX_EFI_CRASH_GUID; > + u32 flags = EFI_CAPSULE_PERSIST_ACROSS_RESET | > + EFI_CAPSULE_POPULATE_SYSTEM_TABLE; > + > + rv = efi_capsule_supported(guid, flags, CAPSULE_SIZE, > _reset_type); > + if (rv) > + return rv; > + > + return 0; > +} > + > +static __init int efi_capsule_pstore_init(void) > +{ > + int rv = 0; > + > + if (!efi_enabled(EFI_RUNTIME_SERVICES)) > + return -ENODEV; > + > + if (efi_capsule_pstore_disable) { > + pr_info("Capsule-pstore is disabled\n "); > + return 0; > + } > + > + rv = check_capsule_support(); > + if (rv) { > + pr_info("Capsule-pstore is not supported: %d\n ", rv); > + return rv; > + } > + > + rv = efi_capsule_pstore_setup(); > + if (rv) { > + pr_err("Fail to setup capsule-pstore: %d\n", rv); > + return rv; > + } > + > + /* > +* Once the memory reserved for capsules passed to the firmware > +* by EFI UpdateCapsule(), there's no way to let firmware give up > +* the capsules. So this module cannot be unloaded once loaded. > +*/ > + __module_get(THIS_MODULE); > + > + return 0; > +} > + > +static __exit void efi_capsule_pstore_exit(void) {} > + > +module_init(efi_capsule_pstore_init); > +module_exit(efi_capsule_pstore_exit); > + > +module_param_named(pstore_disable, efi_capsule_pstore_disable, bool, 0644); > + > +MODULE_DESCRIPTION("EFI capsule backend for pstore"); > +MODULE_LICENSE("GPL v2"); > -- > 2.9.0.GIT > I can't speak well to the EFI portions, but the after fixing the nits above, this looks fine to me. :) -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] efi-pstore: Fix write/erase id tracking
Prior to the pstore interface refactoring, the "id" generated during a backend pstore_write() was only retained by the internal pstore inode tracking list. Additionally the "part" was ignored, so EFI would encode this in the id. This corrects the misunderstandings and correctly sets "id" during pstore_write(), and uses "part" directly during pstore_erase(). Reported-by: Marta Lofstedt <marta.lofst...@intel.com> Fixes: 76cc9580e3fb ("pstore: Replace arguments for write() API") Fixes: a61072aae693 ("pstore: Replace arguments for erase() API") Signed-off-by: Kees Cook <keesc...@chromium.org> --- Since the pstore refactor broke this, I intend to push this via the pstore tree. --- drivers/firmware/efi/efi-pstore.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 44148fd4c9f2..dda2e96328c0 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -53,6 +53,7 @@ static int efi_pstore_read_func(struct efivar_entry *entry, if (sscanf(name, "dump-type%u-%u-%d-%lu-%c", >type, , , , _type) == 5) { record->id = generic_id(time, part, cnt); + record->part = part; record->count = cnt; record->time.tv_sec = time; record->time.tv_nsec = 0; @@ -64,6 +65,7 @@ static int efi_pstore_read_func(struct efivar_entry *entry, } else if (sscanf(name, "dump-type%u-%u-%d-%lu", >type, , , ) == 4) { record->id = generic_id(time, part, cnt); + record->part = part; record->count = cnt; record->time.tv_sec = time; record->time.tv_nsec = 0; @@ -77,6 +79,7 @@ static int efi_pstore_read_func(struct efivar_entry *entry, * multiple logs, remains. */ record->id = generic_id(time, part, 0); + record->part = part; record->count = 0; record->time.tv_sec = time; record->time.tv_nsec = 0; @@ -241,9 +244,15 @@ static int efi_pstore_write(struct pstore_record *record) efi_guid_t vendor = LINUX_EFI_CRASH_GUID; int i, ret = 0; + record->time.tv_sec = get_seconds(); + record->time.tv_nsec = 0; + + record->id = generic_id(record->time.tv_sec, record->part, + record->count); + snprintf(name, sizeof(name), "dump-type%u-%u-%d-%lu-%c", record->type, record->part, record->count, -get_seconds(), record->compressed ? 'C' : 'D'); +record->time.tv_sec, record->compressed ? 'C' : 'D'); for (i = 0; i < DUMP_NAME_LEN; i++) efi_name[i] = name[i]; @@ -255,7 +264,6 @@ static int efi_pstore_write(struct pstore_record *record) if (record->reason == KMSG_DUMP_OOPS) efivar_run_worker(); - record->id = record->part; return ret; }; @@ -287,7 +295,7 @@ static int efi_pstore_erase_func(struct efivar_entry *entry, void *data) * holding multiple logs, remains. */ snprintf(name_old, sizeof(name_old), "dump-type%u-%u-%lu", - ed->record->type, (unsigned int)ed->record->id, + ed->record->type, ed->record->part, ed->record->time.tv_sec); for (i = 0; i < DUMP_NAME_LEN; i++) @@ -320,10 +328,7 @@ static int efi_pstore_erase(struct pstore_record *record) char name[DUMP_NAME_LEN]; efi_char16_t efi_name[DUMP_NAME_LEN]; int found, i; - unsigned int part; - do_div(record->id, 1000); - part = do_div(record->id, 100); snprintf(name, sizeof(name), "dump-type%u-%u-%d-%lu", record->type, record->part, record->count, record->time.tv_sec); -- 2.7.4 -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi-pstore: Fix read iter after pstore API refactor
On Thu, May 18, 2017 at 9:18 AM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 18 May 2017 at 14:01, Kees Cook <keesc...@chromium.org> wrote: >> On Thu, May 18, 2017 at 3:35 AM, Ard Biesheuvel >> <ard.biesheu...@linaro.org> wrote: >>> On 12 May 2017 at 22:58, Kees Cook <keesc...@chromium.org> wrote: >>>> During the internal pstore API refactoring, the EFI vars read entry was >>>> accidentally made to update a stack variable instead of the pstore >>>> private data pointer. This corrects the problem (and removes the now >>>> needless argument). >>>> >>>> Signed-off-by: Kees Cook <keesc...@chromium.org> >>> >>> Does this need a cc stable? >> >> No, the refactor first appeared in 4.12-rc1. >> > > OK. Applied. Err, eek, no, it's already gone into Linus's tree. -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi-pstore: Fix read iter after pstore API refactor
On Thu, May 18, 2017 at 3:35 AM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 12 May 2017 at 22:58, Kees Cook <keesc...@chromium.org> wrote: >> During the internal pstore API refactoring, the EFI vars read entry was >> accidentally made to update a stack variable instead of the pstore >> private data pointer. This corrects the problem (and removes the now >> needless argument). >> >> Signed-off-by: Kees Cook <keesc...@chromium.org> > > Does this need a cc stable? No, the refactor first appeared in 4.12-rc1. -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] efi-pstore: Fix read iter after pstore API refactor
During the internal pstore API refactoring, the EFI vars read entry was accidentally made to update a stack variable instead of the pstore private data pointer. This corrects the problem (and removes the now needless argument). Signed-off-by: Kees Cook <keesc...@chromium.org> --- drivers/firmware/efi/efi-pstore.c | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 93d8cdbe7ef4..9e6f14c354f1 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -155,25 +155,20 @@ static int efi_pstore_scan_sysfs_exit(struct efivar_entry *pos, * efi_pstore_sysfs_entry_iter * * @record: pstore record to pass to callback - * @pos: entry to begin iterating from * * You MUST call efivar_enter_iter_begin() before this function, and * efivar_entry_iter_end() afterwards. * - * It is possible to begin iteration from an arbitrary entry within - * the list by passing @pos. @pos is updated on return to point to - * the next entry of the last one passed to efi_pstore_read_func(). - * To begin iterating from the beginning of the list @pos must be %NULL. */ -static int efi_pstore_sysfs_entry_iter(struct pstore_record *record, - struct efivar_entry **pos) +static int efi_pstore_sysfs_entry_iter(struct pstore_record *record) { + struct efivar_entry *pos = (struct efivar_entry *)record->psi->data; struct efivar_entry *entry, *n; struct list_head *head = _sysfs_list; int size = 0; int ret; - if (!*pos) { + if (!pos) { list_for_each_entry_safe(entry, n, head, list) { efi_pstore_scan_sysfs_enter(entry, n, head); @@ -185,21 +180,21 @@ static int efi_pstore_sysfs_entry_iter(struct pstore_record *record, if (size) break; } - *pos = n; + pos = n; return size; } - list_for_each_entry_safe_from((*pos), n, head, list) { - efi_pstore_scan_sysfs_enter((*pos), n, head); + list_for_each_entry_safe_from(pos, n, head, list) { + efi_pstore_scan_sysfs_enter(pos, n, head); - size = efi_pstore_read_func((*pos), record); - ret = efi_pstore_scan_sysfs_exit((*pos), n, head, size < 0); + size = efi_pstore_read_func(pos, record); + ret = efi_pstore_scan_sysfs_exit(pos, n, head, size < 0); if (ret) return ret; if (size) break; } - *pos = n; + pos = n; return size; } @@ -218,7 +213,6 @@ static int efi_pstore_sysfs_entry_iter(struct pstore_record *record, */ static ssize_t efi_pstore_read(struct pstore_record *record) { - struct efivar_entry *entry = (struct efivar_entry *)record->psi->data; ssize_t size; record->buf = kzalloc(EFIVARS_DATA_SIZE_MAX, GFP_KERNEL); @@ -229,7 +223,7 @@ static ssize_t efi_pstore_read(struct pstore_record *record) size = -EINTR; goto out; } - size = efi_pstore_sysfs_entry_iter(record, ); + size = efi_pstore_sysfs_entry_iter(record); efivar_entry_iter_end(); out: -- 2.7.4 -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/24] x86: Restrict MSR access when the kernel is locked down
On Wed, Apr 5, 2017 at 10:12 AM, David Howells <dhowe...@redhat.com> wrote: > From: Matthew Garrett <matthew.garr...@nebula.com> > > Writing to MSRs should not be allowed if the kernel is locked down, since > it could lead to execution of arbitrary code in kernel mode. Based on a > patch by Kees Cook. > > Cc: Kees Cook <keesc...@chromium.org> > Signed-off-by: Matthew Garrett <matthew.garr...@nebula.com> > Signed-off-by: David Howells <dhowe...@redhat.com> Acked-by: Kees Cook <keesc...@chromium.org> -Kees > --- > > arch/x86/kernel/msr.c |7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c > index ef688804f80d..fbcce028e502 100644 > --- a/arch/x86/kernel/msr.c > +++ b/arch/x86/kernel/msr.c > @@ -84,6 +84,9 @@ static ssize_t msr_write(struct file *file, const char > __user *buf, > int err = 0; > ssize_t bytes = 0; > > + if (kernel_is_locked_down()) > + return -EPERM; > + > if (count % 8) > return -EINVAL; /* Invalid chunk size */ > > @@ -131,6 +134,10 @@ static long msr_ioctl(struct file *file, unsigned int > ioc, unsigned long arg) > err = -EBADF; > break; > } > + if (kernel_is_locked_down()) { > + err = -EPERM; > + break; > + } > if (copy_from_user(, uregs, sizeof regs)) { > err = -EFAULT; > break; > -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/18] pstore: Extract common arguments into structure
On Tue, Mar 7, 2017 at 8:22 AM, Namhyung Kim <namhy...@gmail.com> wrote: > On Tue, Mar 7, 2017 at 6:55 AM, Kees Cook <keesc...@chromium.org> wrote: >> The read/mkfile pair pass the same arguments and should be cleared >> between calls. Move to a structure and wipe it after every loop. >> >> Signed-off-by: Kees Cook <keesc...@chromium.org> >> --- >> fs/pstore/platform.c | 55 >> +++--- >> include/linux/pstore.h | 28 - >> 2 files changed, 57 insertions(+), 26 deletions(-) >> >> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c >> index 320a673ecb5b..3fa1575a6e36 100644 >> --- a/fs/pstore/platform.c >> +++ b/fs/pstore/platform.c >> @@ -766,16 +766,9 @@ EXPORT_SYMBOL_GPL(pstore_unregister); >> void pstore_get_records(int quiet) >> { >> struct pstore_info *psi = psinfo; >> - char*buf = NULL; >> - ssize_t size; >> - u64 id; >> - int count; >> - enum pstore_type_id type; >> - struct timespec time; >> + struct pstore_recordrecord = { .psi = psi, }; >> int failed = 0, rc; >> - boolcompressed; >> int unzipped_len = -1; >> - ssize_t ecc_notice_size = 0; >> >> if (!psi) >> return; >> @@ -784,39 +777,51 @@ void pstore_get_records(int quiet) >> if (psi->open && psi->open(psi)) >> goto out; >> >> - while ((size = psi->read(, , , , , >> , >> -_notice_size, psi)) > 0) { >> - if (compressed && (type == PSTORE_TYPE_DMESG)) { >> + while ((record.size = psi->read(, , >> +, , >> +, , >> +_notice_size, >> +record.psi)) > 0) { >> + if (record.compressed && >> + record.type == PSTORE_TYPE_DMESG) { >> if (big_oops_buf) >> - unzipped_len = pstore_decompress(buf, >> - big_oops_buf, size, >> + unzipped_len = pstore_decompress( >> + record.buf, >> + big_oops_buf, >> + record.size, >> big_oops_buf_sz); >> >> if (unzipped_len > 0) { >> - if (ecc_notice_size) >> + if (record.ecc_notice_size) >> memcpy(big_oops_buf + unzipped_len, >> - buf + size, ecc_notice_size); >> - kfree(buf); >> - buf = big_oops_buf; >> - size = unzipped_len; >> - compressed = false; >> + record.buf + recorrecord.size, > > A typo on record.size. Thanks! Yeah, 0-day noticed this too. I've refreshed the patches in my tree with the correction now. -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/18] pstore: Avoid race in module unloading
On Tue, Mar 7, 2017 at 8:16 AM, Namhyung Kim <namhy...@gmail.com> wrote: > Hi Kees, > > On Tue, Mar 7, 2017 at 6:55 AM, Kees Cook <keesc...@chromium.org> wrote: >> Technically, it might be possible for struct pstore_info to go out of >> scope after the module_put(), so report the backend name first. > > But in that case, using pstore will crash the kernel anyway, right? > If so, why pstore doesn't keep a reference until unregister? > Do I miss something? I could be wrong with this, since the backend can't call unregister until register has finished... I'll drop this patch. -Kees > > Thanks, > Namhyung > > >> >> Signed-off-by: Kees Cook <keesc...@chromium.org> >> --- >> fs/pstore/platform.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c >> index 074fe85a2078..d69ef8a840b9 100644 >> --- a/fs/pstore/platform.c >> +++ b/fs/pstore/platform.c >> @@ -722,10 +722,10 @@ int pstore_register(struct pstore_info *psi) >> */ >> backend = psi->name; >> >> - module_put(owner); >> - >> pr_info("Registered %s as persistent store backend\n", psi->name); >> >> + module_put(owner); >> + >> return 0; >> } >> EXPORT_SYMBOL_GPL(pstore_register); >> -- >> 2.7.4 >> -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/18] pstore: Shut down worker when unregistering
When built as a module and running with update_ms >= 0, pstore will Oops during module unload since the work timer is still running. This makes sure the worker is stopped before unloading. Signed-off-by: Kees Cook <keesc...@chromium.org> Cc: sta...@vger.kernel.org --- fs/pstore/platform.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index cfc1abd264d9..074fe85a2078 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -709,6 +709,7 @@ int pstore_register(struct pstore_info *psi) if (psi->flags & PSTORE_FLAGS_PMSG) pstore_register_pmsg(); + /* Start watching for new records, if desired. */ if (pstore_update_ms >= 0) { pstore_timer.expires = jiffies + msecs_to_jiffies(pstore_update_ms); @@ -731,6 +732,11 @@ EXPORT_SYMBOL_GPL(pstore_register); void pstore_unregister(struct pstore_info *psi) { + /* Stop timer and make sure all work has finished. */ + pstore_update_ms = -1; + del_timer_sync(_timer); + flush_work(_work); + if (psi->flags & PSTORE_FLAGS_PMSG) pstore_unregister_pmsg(); if (psi->flags & PSTORE_FLAGS_FTRACE) @@ -830,7 +836,9 @@ static void pstore_timefunc(unsigned long dummy) schedule_work(_work); } - mod_timer(_timer, jiffies + msecs_to_jiffies(pstore_update_ms)); + if (pstore_update_ms >= 0) + mod_timer(_timer, + jiffies + msecs_to_jiffies(pstore_update_ms)); } module_param(backend, charp, 0444); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/18] pstore: Improve register_pstore() error reporting
Uncommon errors are better to get reported to dmesg so developers can more easily figure out why pstore is unhappy with a backend attempting to register. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/platform.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index d69ef8a840b9..320a673ecb5b 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -673,11 +673,15 @@ int pstore_register(struct pstore_info *psi) { struct module *owner = psi->owner; - if (backend && strcmp(backend, psi->name)) + if (backend && strcmp(backend, psi->name)) { + pr_warn("ignoring unexpected backend '%s'\n", psi->name); return -EPERM; + } spin_lock(_lock); if (psinfo) { + pr_warn("backend '%s' already loaded: ignoring '%s'\n", + psinfo->name, psi->name); spin_unlock(_lock); return -EBUSY; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/18] pstore: Allocate records on heap instead of stack
In preparation for handling records off to pstore_mkfile(), allocate the record instead of reusing stack. This still always frees the record, though, since pstore_mkfile() isn't yet keeping it. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/platform.c | 30 ++ 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index d897e2f11b6a..072326625629 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -818,8 +818,7 @@ static void decompress_record(struct pstore_record *record) void pstore_get_records(int quiet) { struct pstore_info *psi = psinfo; - struct pstore_recordrecord = { .psi = psi, }; - int failed = 0, rc; + int failed = 0; if (!psi) return; @@ -833,19 +832,34 @@ void pstore_get_records(int quiet) * may reallocate record.buf. On success, pstore_mkfile() will keep * the record.buf, so free it only on failure. */ - while ((record.size = psi->read()) > 0) { - decompress_record(); - rc = pstore_mkfile(); + for (;;) { + struct pstore_record *record; + int rc; + + record = kzalloc(sizeof(*record), GFP_KERNEL); + if (!record) { + pr_err("out of memory creating record\n"); + break; + } + record->psi = psi; + + record->size = psi->read(record); + + /* No more records left in backend? */ + if (record->size <= 0) + break; + + decompress_record(record); + rc = pstore_mkfile(record); if (rc) { /* pstore_mkfile() did not take buf, so free it. */ - kfree(record.buf); + kfree(record->buf); if (rc != -EEXIST || !quiet) failed++; } /* Reset for next record. */ - memset(, 0, sizeof(record)); - record.psi = psi; + kfree(record); } if (psi->close) psi->close(psi); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/18] pstore: Pass record contents instead of copying
pstore_mkfile() shouldn't have to memcpy the record contents. It can use the existing copy instead. This adjusts the allocation lifetime management and renames the contents variable from "data" to "buf" to assist moving to struct pstore_record in the future. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/inode.c| 22 +++--- fs/pstore/platform.c | 16 2 files changed, 27 insertions(+), 11 deletions(-) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index a98787bab3e6..3d1f047e4f41 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -52,7 +52,7 @@ struct pstore_private { u64 id; int count; ssize_t size; - chardata[]; + char*buf; }; struct pstore_ftrace_seq_data { @@ -63,6 +63,14 @@ struct pstore_ftrace_seq_data { #define REC_SIZE sizeof(struct pstore_ftrace_record) +static void free_pstore_private(struct pstore_private *private) +{ + if (!private) + return; + kfree(private->buf); + kfree(private); +} + static void *pstore_ftrace_seq_start(struct seq_file *s, loff_t *pos) { struct pstore_private *ps = s->private; @@ -105,7 +113,7 @@ static int pstore_ftrace_seq_show(struct seq_file *s, void *v) { struct pstore_private *ps = s->private; struct pstore_ftrace_seq_data *data = v; - struct pstore_ftrace_record *rec = (void *)(ps->data + data->off); + struct pstore_ftrace_record *rec = (void *)(ps->buf + data->off); seq_printf(s, "CPU:%d ts:%llu %08lx %08lx %pf <- %pF\n", pstore_ftrace_decode_cpu(rec), @@ -143,7 +151,7 @@ static ssize_t pstore_file_read(struct file *file, char __user *userbuf, if (ps->type == PSTORE_TYPE_FTRACE) return seq_read(file, userbuf, count, ppos); - return simple_read_from_buffer(userbuf, count, ppos, ps->data, ps->size); + return simple_read_from_buffer(userbuf, count, ppos, ps->buf, ps->size); } static int pstore_file_open(struct inode *inode, struct file *file) @@ -221,7 +229,7 @@ static void pstore_evict_inode(struct inode *inode) spin_lock_irqsave(_lock, flags); list_del(>list); spin_unlock_irqrestore(_lock, flags); - kfree(p); + free_pstore_private(p); } } @@ -332,7 +340,7 @@ int pstore_mkfile(struct pstore_record *record) goto fail; inode->i_mode = S_IFREG | 0444; inode->i_fop = _file_operations; - private = kmalloc(sizeof *private + size, GFP_KERNEL); + private = kzalloc(sizeof(*private), GFP_KERNEL); if (!private) goto fail_alloc; private->type = record->type; @@ -394,7 +402,7 @@ int pstore_mkfile(struct pstore_record *record) if (!dentry) goto fail_lockedalloc; - memcpy(private->data, record->buf, size); + private->buf = record->buf; inode->i_size = private->size = size; inode->i_private = private; @@ -414,7 +422,7 @@ int pstore_mkfile(struct pstore_record *record) fail_lockedalloc: inode_unlock(d_inode(root)); - kfree(private); + free_pstore_private(private); fail_alloc: iput(inode); diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index c0d401e732e6..d897e2f11b6a 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -828,14 +828,22 @@ void pstore_get_records(int quiet) if (psi->open && psi->open(psi)) goto out; + /* +* Backend callback read() allocates record.buf. decompress_record() +* may reallocate record.buf. On success, pstore_mkfile() will keep +* the record.buf, so free it only on failure. +*/ while ((record.size = psi->read()) > 0) { decompress_record(); rc = pstore_mkfile(); + if (rc) { + /* pstore_mkfile() did not take buf, so free it. */ + kfree(record.buf); + if (rc != -EEXIST || !quiet) + failed++; + } - if (rc && (rc != -EEXIST || !quiet)) - failed++; - - kfree(record.buf); + /* Reset for next record. */ memset(, 0, sizeof(record)); record.psi = psi; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/18] pstore: Remove write_buf() callback
Now that write() and write_buf() are functionally identical, this removes write_buf(), and renames write_buf_user() to write_user(). Additionally adds sanity-checks for pstore_info's declared functions and flags at registration time. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/ftrace.c | 4 ++-- fs/pstore/platform.c | 35 --- fs/pstore/pmsg.c | 4 ++-- fs/pstore/ram.c| 10 +- include/linux/pstore.h | 29 ++--- 5 files changed, 39 insertions(+), 43 deletions(-) diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c index a5506ec6995e..06aab07b6bb7 100644 --- a/fs/pstore/ftrace.c +++ b/fs/pstore/ftrace.c @@ -53,7 +53,7 @@ static void notrace pstore_ftrace_call(unsigned long ip, rec.parent_ip = parent_ip; pstore_ftrace_write_timestamp(, pstore_ftrace_stamp++); pstore_ftrace_encode_cpu(, raw_smp_processor_id()); - psinfo->write_buf(); + psinfo->write(); local_irq_restore(flags); } @@ -122,7 +122,7 @@ void pstore_register_ftrace(void) { struct dentry *file; - if (!psinfo->write_buf) + if (!psinfo->write) return; pstore_ftrace_dir = debugfs_create_dir("pstore", NULL); diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 1e6642a2063e..e79f170fa79b 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -604,7 +604,7 @@ static void pstore_console_write(struct console *con, const char *s, unsigned c) } record.buf = (char *)s; record.size = c; - psinfo->write_buf(); + psinfo->write(); spin_unlock_irqrestore(>buf_lock, flags); s += c; c = e - s; @@ -632,15 +632,8 @@ static void pstore_register_console(void) {} static void pstore_unregister_console(void) {} #endif -static int pstore_write_compat(struct pstore_record *record) -{ - record->buf = psinfo->buf; - - return record->psi->write_buf(record); -} - -static int pstore_write_buf_user_compat(struct pstore_record *record, - const char __user *buf) +static int pstore_write_user_compat(struct pstore_record *record, + const char __user *buf) { unsigned long flags = 0; size_t i, bufsize, total_size = record->size; @@ -662,7 +655,7 @@ static int pstore_write_buf_user_compat(struct pstore_record *record, break; } record->size = c; - ret = record->psi->write_buf(record); + ret = record->psi->write(record); if (unlikely(ret < 0)) break; i += c; @@ -687,6 +680,20 @@ int pstore_register(struct pstore_info *psi) return -EPERM; } + /* Sanity check flags. */ + if (!psi->flags) { + pr_warn("backend '%s' must support at least one frontend\n", + psi->name); + return -EINVAL; + } + + /* Check for required functions. */ + if (!psi->read || !psi->write) { + pr_warn("backend '%s' must implement read() and write()\n", + psi->name); + return -EINVAL; + } + spin_lock(_lock); if (psinfo) { pr_warn("backend '%s' already loaded: ignoring '%s'\n", @@ -695,10 +702,8 @@ int pstore_register(struct pstore_info *psi) return -EBUSY; } - if (!psi->write) - psi->write = pstore_write_compat; - if (!psi->write_buf_user) - psi->write_buf_user = pstore_write_buf_user_compat; + if (!psi->write_user) + psi->write_user = pstore_write_user_compat; psinfo = psi; mutex_init(>read_mutex); spin_unlock(_lock); diff --git a/fs/pstore/pmsg.c b/fs/pstore/pmsg.c index ce35907602de..c16a2477e106 100644 --- a/fs/pstore/pmsg.c +++ b/fs/pstore/pmsg.c @@ -33,12 +33,12 @@ static ssize_t write_pmsg(struct file *file, const char __user *buf, if (!count) return 0; - /* check outside lock, page in any data. write_buf_user also checks */ + /* check outside lock, page in any data. write_user also checks */ if (!access_ok(VERIFY_READ, buf, count)) return -EFAULT; mutex_lock(_lock); - ret = psinfo->write_buf_user(, buf); + ret = psinfo->write_user(, buf); mutex_unlock(_lock); return ret ? ret : count; } diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index d85e1adae1b6..5523df7f17ef 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -378,7 +378,7 @@ static size_t ramoops_write_kmsg_hdr(struct per
[PATCH 11/18] pstore: Always allocate buffer for decompression
Currently, pstore_mkfile() performs a memcpy() of the record contents, so it can live anywhere. However, this is needlessly wasteful. In preparation of pstore_mkfile() keeping the record contents, always allocate a buffer for the contents. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/platform.c | 38 -- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 879658b4c679..c0d401e732e6 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -768,6 +768,7 @@ EXPORT_SYMBOL_GPL(pstore_unregister); static void decompress_record(struct pstore_record *record) { int unzipped_len; + char *decompressed; /* Only PSTORE_TYPE_DMESG support compression. */ if (!record->compressed || record->type != PSTORE_TYPE_DMESG) { @@ -783,17 +784,29 @@ static void decompress_record(struct pstore_record *record) unzipped_len = pstore_decompress(record->buf, big_oops_buf, record->size, big_oops_buf_sz); - if (unzipped_len > 0) { - if (record->ecc_notice_size) - memcpy(big_oops_buf + unzipped_len, - record->buf + record->size, - record->ecc_notice_size); - kfree(record->buf); - record->buf = big_oops_buf; - record->size = unzipped_len; - record->compressed = false; - } else + if (unzipped_len <= 0) { pr_err("decompression failed: %d\n", unzipped_len); + return; + } + + /* Build new buffer for decompressed contents. */ + decompressed = kmalloc(unzipped_len + record->ecc_notice_size, + GFP_KERNEL); + if (!decompressed) { + pr_err("decompression ran out of memory\n"); + return; + } + memcpy(decompressed, big_oops_buf, unzipped_len); + + /* Append ECC notice to decompressed buffer. */ + memcpy(decompressed + unzipped_len, record->buf + record->size, + record->ecc_notice_size); + + /* Swap out compresed contents with decompressed contents. */ + kfree(record->buf); + record->buf = decompressed; + record->size = unzipped_len; + record->compressed = false; } /* @@ -819,13 +832,10 @@ void pstore_get_records(int quiet) decompress_record(); rc = pstore_mkfile(); - /* Free buffer other than big oops */ - if (record.buf != big_oops_buf) - kfree(record.buf); - if (rc && (rc != -EEXIST || !quiet)) failed++; + kfree(record.buf); memset(, 0, sizeof(record)); record.psi = psi; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/18] pstore: Replace arguments for erase() API
This removes the argument list for the erase() callback and replaces it with a pointer to the backend record details to be removed. Signed-off-by: Kees Cook <keesc...@chromium.org> --- drivers/acpi/apei/erst.c | 8 +++- drivers/firmware/efi/efi-pstore.c | 26 +++--- fs/pstore/inode.c | 12 +--- fs/pstore/ram.c | 15 +++ include/linux/pstore.h| 16 +--- 5 files changed, 31 insertions(+), 46 deletions(-) diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c index 440588d189e7..7207e5fc9d3d 100644 --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -927,8 +927,7 @@ static int erst_open_pstore(struct pstore_info *psi); static int erst_close_pstore(struct pstore_info *psi); static ssize_t erst_reader(struct pstore_record *record); static int erst_writer(struct pstore_record *record); -static int erst_clearer(enum pstore_type_id type, u64 id, int count, - struct timespec time, struct pstore_info *psi); +static int erst_clearer(struct pstore_record *record); static struct pstore_info erst_info = { .owner = THIS_MODULE, @@ -1100,10 +1099,9 @@ static int erst_writer(struct pstore_record *record) return ret; } -static int erst_clearer(enum pstore_type_id type, u64 id, int count, - struct timespec time, struct pstore_info *psi) +static int erst_clearer(struct pstore_record *record) { - return erst_clear(id); + return erst_clear(record->id); } static int __init erst_init(void) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index f81e3ec6f1c0..93d8cdbe7ef4 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -266,10 +266,7 @@ static int efi_pstore_write(struct pstore_record *record) }; struct pstore_erase_data { - u64 id; - enum pstore_type_id type; - int count; - struct timespec time; + struct pstore_record *record; efi_char16_t *name; }; @@ -295,8 +292,9 @@ static int efi_pstore_erase_func(struct efivar_entry *entry, void *data) * Check if an old format, which doesn't support * holding multiple logs, remains. */ - sprintf(name_old, "dump-type%u-%u-%lu", ed->type, - (unsigned int)ed->id, ed->time.tv_sec); + snprintf(name_old, sizeof(name_old), "dump-type%u-%u-%lu", + ed->record->type, (unsigned int)ed->record->id, + ed->record->time.tv_sec); for (i = 0; i < DUMP_NAME_LEN; i++) efi_name_old[i] = name_old[i]; @@ -321,8 +319,7 @@ static int efi_pstore_erase_func(struct efivar_entry *entry, void *data) return 1; } -static int efi_pstore_erase(enum pstore_type_id type, u64 id, int count, - struct timespec time, struct pstore_info *psi) +static int efi_pstore_erase(struct pstore_record *record) { struct pstore_erase_data edata; struct efivar_entry *entry = NULL; @@ -331,17 +328,16 @@ static int efi_pstore_erase(enum pstore_type_id type, u64 id, int count, int found, i; unsigned int part; - do_div(id, 1000); - part = do_div(id, 100); - sprintf(name, "dump-type%u-%u-%d-%lu", type, part, count, time.tv_sec); + do_div(record->id, 1000); + part = do_div(record->id, 100); + snprintf(name, sizeof(name), "dump-type%u-%u-%d-%lu", +record->type, record->part, record->count, +record->time.tv_sec); for (i = 0; i < DUMP_NAME_LEN; i++) efi_name[i] = name[i]; - edata.id = part; - edata.type = type; - edata.count = count; - edata.time = time; + edata.record = record; edata.name = efi_name; if (efivar_entry_iter_begin()) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 0ea281b457fa..06504b69575b 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -210,14 +210,12 @@ static int pstore_unlink(struct inode *dir, struct dentry *dentry) if (err) return err; - if (record->psi->erase) { - mutex_lock(>psi->read_mutex); - record->psi->erase(record->type, record->id, record->count, - d_inode(dentry)->i_ctime, record->psi); - mutex_unlock(>psi->read_mutex); - } else { + if (!record->psi->erase) return -EPERM; - } + + mutex_lock(>psi->read_mutex); + record->psi->erase(record); + mutex_unlock(>psi->read_mutex); return simple_unlink(dir, dentry); } diff --g
[PATCH 08/18] pstore: Switch pstore_mkfile to pass record
Instead of the long list of arguments, just pass the new record struct. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/inode.c| 57 +--- fs/pstore/internal.h | 5 + fs/pstore/platform.c | 6 +- 3 files changed, 34 insertions(+), 34 deletions(-) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 57c0646479f5..a98787bab3e6 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -302,9 +302,7 @@ bool pstore_is_mounted(void) * Load it up with "size" bytes of data from "buf". * Set the mtime & ctime to the date that this record was originally stored. */ -int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count, - char *data, bool compressed, size_t size, - struct timespec time, struct pstore_info *psi) +int pstore_mkfile(struct pstore_record *record) { struct dentry *root = pstore_sb->s_root; struct dentry *dentry; @@ -313,12 +311,13 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count, charname[PSTORE_NAMELEN]; struct pstore_private *private, *pos; unsigned long flags; + size_t size = record->size + record->ecc_notice_size; spin_lock_irqsave(_lock, flags); list_for_each_entry(pos, , list) { - if (pos->type == type && - pos->id == id && - pos->psi == psi) { + if (pos->type == record->type && + pos->id == record->id && + pos->psi == record->psi) { rc = -EEXIST; break; } @@ -336,48 +335,56 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count, private = kmalloc(sizeof *private + size, GFP_KERNEL); if (!private) goto fail_alloc; - private->type = type; - private->id = id; - private->count = count; - private->psi = psi; + private->type = record->type; + private->id = record->id; + private->count = record->count; + private->psi = record->psi; - switch (type) { + switch (record->type) { case PSTORE_TYPE_DMESG: scnprintf(name, sizeof(name), "dmesg-%s-%lld%s", - psname, id, compressed ? ".enc.z" : ""); + record->psi->name, record->id, + record->compressed ? ".enc.z" : ""); break; case PSTORE_TYPE_CONSOLE: - scnprintf(name, sizeof(name), "console-%s-%lld", psname, id); + scnprintf(name, sizeof(name), "console-%s-%lld", + record->psi->name, record->id); break; case PSTORE_TYPE_FTRACE: - scnprintf(name, sizeof(name), "ftrace-%s-%lld", psname, id); + scnprintf(name, sizeof(name), "ftrace-%s-%lld", + record->psi->name, record->id); break; case PSTORE_TYPE_MCE: - scnprintf(name, sizeof(name), "mce-%s-%lld", psname, id); + scnprintf(name, sizeof(name), "mce-%s-%lld", + record->psi->name, record->id); break; case PSTORE_TYPE_PPC_RTAS: - scnprintf(name, sizeof(name), "rtas-%s-%lld", psname, id); + scnprintf(name, sizeof(name), "rtas-%s-%lld", + record->psi->name, record->id); break; case PSTORE_TYPE_PPC_OF: scnprintf(name, sizeof(name), "powerpc-ofw-%s-%lld", - psname, id); + record->psi->name, record->id); break; case PSTORE_TYPE_PPC_COMMON: scnprintf(name, sizeof(name), "powerpc-common-%s-%lld", - psname, id); + record->psi->name, record->id); break; case PSTORE_TYPE_PMSG: - scnprintf(name, sizeof(name), "pmsg-%s-%lld", psname, id); + scnprintf(name, sizeof(name), "pmsg-%s-%lld", + record->psi->name, record->id); break; case PSTORE_TYPE_PPC_OPAL: - sprintf(name, "powerpc-opal-%s-%lld", psname, id); + scnprintf(name, sizeof(name), "powerpc-opal-%s-%lld", + record->psi->name, record->id); break; case PSTORE
[PATCH 00/18] pstore: refactor internal APIs
For a long time I've been bothered by the complexity of argument passing in the pstore internals, which makes understanding things and changing things extremely fragile. With the proposal of a new backend (EPI capsules), and my attempts to reorganize things for the proposed multiple-pmsg frontend, I've spent some time refactoring the internal pstore API to pass a record structure instead of lots of arguments. This hugely simplifies things, and lets me document the API better, reduce memory copies, etc. Included in the series are a few bug fixes as well. If pstore backend authors could review the changes I've made in their drivers, I'd appreciate it. Hopefully I got it all right. :) And unless there are any objections, I'll put this into -next in a couple days. I intend to continue working on this to fix up the "ecc_notification_size" field and make it more generalized, and get multiple pmsg support added too, but I'd like to get this first series out for review first, since it's rather long already. Thanks! -Kees -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/18] pstore: Do not duplicate record metadata
This switches the inode-private data from carrying duplicate metadata to keeping the record passed in during pstore_mkfile(). Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/inode.c| 57 ++-- fs/pstore/platform.c | 6 ++ 2 files changed, 30 insertions(+), 33 deletions(-) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 3d1f047e4f41..0ea281b457fa 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -47,12 +47,8 @@ static LIST_HEAD(allpstore); struct pstore_private { struct list_head list; - struct pstore_info *psi; - enum pstore_type_id type; - u64 id; - int count; - ssize_t size; - char*buf; + struct pstore_record *record; + size_t total_size; }; struct pstore_ftrace_seq_data { @@ -67,7 +63,10 @@ static void free_pstore_private(struct pstore_private *private) { if (!private) return; - kfree(private->buf); + if (private->record) { + kfree(private->record->buf); + kfree(private->record); + } kfree(private); } @@ -80,9 +79,9 @@ static void *pstore_ftrace_seq_start(struct seq_file *s, loff_t *pos) if (!data) return NULL; - data->off = ps->size % REC_SIZE; + data->off = ps->total_size % REC_SIZE; data->off += *pos * REC_SIZE; - if (data->off + REC_SIZE > ps->size) { + if (data->off + REC_SIZE > ps->total_size) { kfree(data); return NULL; } @@ -102,7 +101,7 @@ static void *pstore_ftrace_seq_next(struct seq_file *s, void *v, loff_t *pos) struct pstore_ftrace_seq_data *data = v; data->off += REC_SIZE; - if (data->off + REC_SIZE > ps->size) + if (data->off + REC_SIZE > ps->total_size) return NULL; (*pos)++; @@ -113,7 +112,9 @@ static int pstore_ftrace_seq_show(struct seq_file *s, void *v) { struct pstore_private *ps = s->private; struct pstore_ftrace_seq_data *data = v; - struct pstore_ftrace_record *rec = (void *)(ps->buf + data->off); + struct pstore_ftrace_record *rec; + + rec = (struct pstore_ftrace_record *)(ps->record->buf + data->off); seq_printf(s, "CPU:%d ts:%llu %08lx %08lx %pf <- %pF\n", pstore_ftrace_decode_cpu(rec), @@ -133,7 +134,7 @@ static const struct seq_operations pstore_ftrace_seq_ops = { static int pstore_check_syslog_permissions(struct pstore_private *ps) { - switch (ps->type) { + switch (ps->record->type) { case PSTORE_TYPE_DMESG: case PSTORE_TYPE_CONSOLE: return check_syslog_permissions(SYSLOG_ACTION_READ_ALL, @@ -149,9 +150,10 @@ static ssize_t pstore_file_read(struct file *file, char __user *userbuf, struct seq_file *sf = file->private_data; struct pstore_private *ps = sf->private; - if (ps->type == PSTORE_TYPE_FTRACE) + if (ps->record->type == PSTORE_TYPE_FTRACE) return seq_read(file, userbuf, count, ppos); - return simple_read_from_buffer(userbuf, count, ppos, ps->buf, ps->size); + return simple_read_from_buffer(userbuf, count, ppos, + ps->record->buf, ps->total_size); } static int pstore_file_open(struct inode *inode, struct file *file) @@ -165,7 +167,7 @@ static int pstore_file_open(struct inode *inode, struct file *file) if (err) return err; - if (ps->type == PSTORE_TYPE_FTRACE) + if (ps->record->type == PSTORE_TYPE_FTRACE) sops = _ftrace_seq_ops; err = seq_open(file, sops); @@ -201,17 +203,18 @@ static const struct file_operations pstore_file_operations = { static int pstore_unlink(struct inode *dir, struct dentry *dentry) { struct pstore_private *p = d_inode(dentry)->i_private; + struct pstore_record *record = p->record; int err; err = pstore_check_syslog_permissions(p); if (err) return err; - if (p->psi->erase) { - mutex_lock(>psi->read_mutex); - p->psi->erase(p->type, p->id, p->count, - d_inode(dentry)->i_ctime, p->psi); - mutex_unlock(>psi->read_mutex); + if (record->psi->erase) { + mutex_lock(>psi->read_mutex); + record->psi->erase(record->type, record->id, record->count, + d_inode(dentry)->i_ctime, record->psi); + mutex_unlock(>psi->read_mutex); } else { return -EPERM; } @@ -323,9 +326,9 @@ int pstore_mkfile(struct pstore_r
[PATCH 06/18] pstore: Extract common arguments into structure
The read/mkfile pair pass the same arguments and should be cleared between calls. Move to a structure and wipe it after every loop. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/platform.c | 55 +++--- include/linux/pstore.h | 28 - 2 files changed, 57 insertions(+), 26 deletions(-) diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 320a673ecb5b..3fa1575a6e36 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -766,16 +766,9 @@ EXPORT_SYMBOL_GPL(pstore_unregister); void pstore_get_records(int quiet) { struct pstore_info *psi = psinfo; - char*buf = NULL; - ssize_t size; - u64 id; - int count; - enum pstore_type_id type; - struct timespec time; + struct pstore_recordrecord = { .psi = psi, }; int failed = 0, rc; - boolcompressed; int unzipped_len = -1; - ssize_t ecc_notice_size = 0; if (!psi) return; @@ -784,39 +777,51 @@ void pstore_get_records(int quiet) if (psi->open && psi->open(psi)) goto out; - while ((size = psi->read(, , , , , , -_notice_size, psi)) > 0) { - if (compressed && (type == PSTORE_TYPE_DMESG)) { + while ((record.size = psi->read(, , +, , +, , +_notice_size, +record.psi)) > 0) { + if (record.compressed && + record.type == PSTORE_TYPE_DMESG) { if (big_oops_buf) - unzipped_len = pstore_decompress(buf, - big_oops_buf, size, + unzipped_len = pstore_decompress( + record.buf, + big_oops_buf, + record.size, big_oops_buf_sz); if (unzipped_len > 0) { - if (ecc_notice_size) + if (record.ecc_notice_size) memcpy(big_oops_buf + unzipped_len, - buf + size, ecc_notice_size); - kfree(buf); - buf = big_oops_buf; - size = unzipped_len; - compressed = false; + record.buf + recorrecord.size, + record.ecc_notice_size); + kfree(record.buf); + record.buf = big_oops_buf; + record.size = unzipped_len; + record.compressed = false; } else { pr_err("decompression failed;returned %d\n", unzipped_len); - compressed = true; + record.compressed = true; } } - rc = pstore_mkfile(type, psi->name, id, count, buf, - compressed, size + ecc_notice_size, - time, psi); + rc = pstore_mkfile(record.type, psi->name, record.id, + record.count, record.buf, + record.compressed, + record.size + record.ecc_notice_size, + record.time, record.psi); if (unzipped_len < 0) { /* Free buffer other than big oops */ - kfree(buf); - buf = NULL; + kfree(record.buf); + record.buf = NULL; } else unzipped_len = -1; if (rc && (rc != -EEXIST || !quiet)) failed++; + + memset(, 0, sizeof(record)); + record.psi = psi; } if (psi->close) psi->close(psi); diff --git a/include/linux/pstore.h b/include/linux/pstore.h index 083b10bacd4a..7b25f7f17915 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -30,6 +30,8 @@ #include #include +struct module; + /* pstore record types (see fs/pstore/inode.c for filename templates) */
[PATCH 09/18] pstore: Replace arguments for read() API
The argument list for the pstore_read() interface is unwieldy. This changes passes the new struct pstore_record instead. The erst backend was already doing something similar internally. Signed-off-by: Kees Cook <keesc...@chromium.org> --- arch/powerpc/kernel/nvram_64.c| 61 +++--- drivers/acpi/apei/erst.c | 38 ++ drivers/firmware/efi/efi-pstore.c | 104 -- fs/pstore/platform.c | 7 +-- fs/pstore/ram.c | 53 ++- include/linux/pstore.h| 20 +++- 6 files changed, 124 insertions(+), 159 deletions(-) diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c index d5e2b8309939..7f192001d09a 100644 --- a/arch/powerpc/kernel/nvram_64.c +++ b/arch/powerpc/kernel/nvram_64.c @@ -442,10 +442,7 @@ static int nvram_pstore_write(enum pstore_type_id type, * Returns the length of the data we read from each partition. * Returns 0 if we've been called before. */ -static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type, - int *count, struct timespec *time, char **buf, - bool *compressed, ssize_t *ecc_notice_size, - struct pstore_info *psi) +static ssize_t nvram_pstore_read(struct pstore_record *record) { struct oops_log_info *oops_hdr; unsigned int err_type, id_no, size = 0; @@ -459,40 +456,40 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type, switch (nvram_type_ids[read_type]) { case PSTORE_TYPE_DMESG: part = _log_partition; - *type = PSTORE_TYPE_DMESG; + record->type = PSTORE_TYPE_DMESG; break; case PSTORE_TYPE_PPC_COMMON: sig = NVRAM_SIG_SYS; part = _partition; - *type = PSTORE_TYPE_PPC_COMMON; - *id = PSTORE_TYPE_PPC_COMMON; - time->tv_sec = 0; - time->tv_nsec = 0; + record->type = PSTORE_TYPE_PPC_COMMON; + record->id = PSTORE_TYPE_PPC_COMMON; + record->time.tv_sec = 0; + record->time.tv_nsec = 0; break; #ifdef CONFIG_PPC_PSERIES case PSTORE_TYPE_PPC_RTAS: part = _log_partition; - *type = PSTORE_TYPE_PPC_RTAS; - time->tv_sec = last_rtas_event; - time->tv_nsec = 0; + record->type = PSTORE_TYPE_PPC_RTAS; + record->time.tv_sec = last_rtas_event; + record->time.tv_nsec = 0; break; case PSTORE_TYPE_PPC_OF: sig = NVRAM_SIG_OF; part = _config_partition; - *type = PSTORE_TYPE_PPC_OF; - *id = PSTORE_TYPE_PPC_OF; - time->tv_sec = 0; - time->tv_nsec = 0; + record->type = PSTORE_TYPE_PPC_OF; + record->id = PSTORE_TYPE_PPC_OF; + record->time.tv_sec = 0; + record->time.tv_nsec = 0; break; #endif #ifdef CONFIG_PPC_POWERNV case PSTORE_TYPE_PPC_OPAL: sig = NVRAM_SIG_FW; part = _partition; - *type = PSTORE_TYPE_PPC_OPAL; - *id = PSTORE_TYPE_PPC_OPAL; - time->tv_sec = 0; - time->tv_nsec = 0; + record->type = PSTORE_TYPE_PPC_OPAL; + record->id = PSTORE_TYPE_PPC_OPAL; + record->time.tv_sec = 0; + record->time.tv_nsec = 0; break; #endif default: @@ -520,10 +517,10 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type, return 0; } - *count = 0; + record->count = 0; if (part->os_partition) - *id = id_no; + record->id = id_no; if (nvram_type_ids[read_type] == PSTORE_TYPE_DMESG) { size_t length, hdr_size; @@ -533,28 +530,28 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type, /* Old format oops header had 2-byte record size */ hdr_size = sizeof(u16); length = be16_to_cpu(oops_hdr->version); - time->tv_sec = 0; - time->tv_nsec = 0; + record->time.tv_sec = 0; + record->time.tv_nsec = 0; } else { hdr_size = sizeof(*oops_hdr); length = be16_to_cpu(oops_hdr->report_length); - time->tv_sec = be64_to_cpu(oops_hdr->timestamp); - time->tv_nsec = 0; + record->time.tv_sec = be64_to_cpu(oops_hdr->timest
[PATCH 05/18] pstore: Add kernel-doc for struct pstore_info
This adds documentation for struct pstore_info, which also includes the basic API the backends need to implement. Signed-off-by: Kees Cook <keesc...@chromium.org> --- include/linux/pstore.h | 133 +++-- 1 file changed, 128 insertions(+), 5 deletions(-) diff --git a/include/linux/pstore.h b/include/linux/pstore.h index 0da29cae009b..083b10bacd4a 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -30,7 +30,7 @@ #include #include -/* types */ +/* pstore record types (see fs/pstore/inode.c for filename templates) */ enum pstore_type_id { PSTORE_TYPE_DMESG = 0, PSTORE_TYPE_MCE = 1, @@ -47,14 +47,138 @@ enum pstore_type_id { struct module; +/** + * struct pstore_info - backend pstore driver structure + * + * @owner: module which is repsonsible for this backend driver + * @name: name of the backend driver + * + * @buf_lock: spinlock to serialize access to @buf + * @buf: preallocated crash dump buffer + * @bufsize: size of @buf available for crash dump writes + * + * @read_mutex:mutex to serialize @open, @read, and @close callbacks + * @flags: bitfield of frontends the backend can accept writes for + * @data: backend-private pointer passed back during callbacks + * + * Callbacks: + * + * @open: + * Notify backend that pstore is starting a full read of backend + * records. Followed by one or more @read calls, and a final @close. + * + * @psi: in: pointer to the struct pstore_info for the backend + * + * Returns 0 on success, and non-zero on error. + * + * @close: + * Notify backend that pstore has finished a full read of backend + * records. Always preceded by an @open call and one or more @read + * calls. + * + * @psi: in: pointer to the struct pstore_info for the backend + * + * Returns 0 on success, and non-zero on error. (Though pstore will + * ignore the error.) + * + * @read: + * Read next available backend record. Called after a successful + * @open. + * + * @id:out: unique identifier for the record + * @type: out: pstore record type + * @count: out: for PSTORE_TYPE_DMESG, the Oops count. + * @time: out: timestamp for the record + * @buf: out: kmalloc copy of record contents, to be freed by pstore + * @compressed: + * out: if the record contents are compressed + * @ecc_notice_size: + * out: ECC information + * @psi: in: pointer to the struct pstore_info for the backend + * + * Returns record size on success, zero when no more records are + * available, or negative on error. + * + * @write: + * Perform a frontend notification of a write to a backend record. The + * data to be stored has already been written to the registered @buf + * of the @psi structure. + * + * @type: in: pstore record type to write + * @reason: + * in: pstore write reason + * @id:out: unique identifier for the record + * @part: in: position in a multipart write + * @count: in: increasing from 0 since boot, the number of this Oops + * @compressed: + * in: if the record is compressed + * @size: in: size of the write + * @psi: in: pointer to the struct pstore_info for the backend + * + * Returns 0 on success, and non-zero on error. + * + * @write_buf: + * Perform a frontend write to a backend record, using a specified + * buffer. Unlike @write, this does not use the @psi @buf. + * + * @type: in: pstore record type to write + * @reason: + * in: pstore write reason + * @id:out: unique identifier for the record + * @part: in: position in a multipart write + * @buf: in: pointer to contents to write to backend record + * @compressed: + * in: if the record is compressed + * @size: in: size of the write + * @psi: in: pointer to the struct pstore_info for the backend + * + * Returns 0 on success, and non-zero on error. + * + * @write_buf_user: + * Perform a frontend write to a backend record, using a specified + * buffer that is coming directly from userspace. + * + * @type: in: pstore record type to write + * @reason: + * in: pstore write reason + * @id:out: unique identifier for the record + * @part: in: position in a multipart write + * @buf: in: pointer to userspace contents to write to backend record + * @compressed: + * in: if the record is compressed + * @size: in: size of the write + * @psi: in: pointer to the struct pstore_info for the backend + * + * Returns 0 on success, and non-zero on error. + * + * @erase: + * Delete a record from backend storage. Different backends + * identify records differently, so all possible methods of + * identification are included to help the backend locate the + * record to
[PATCH 07/18] pstore: Move record decompression to function
This moves the record decompression logic out to a separate function to avoid the deep indentation. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/platform.c | 67 +--- 1 file changed, 37 insertions(+), 30 deletions(-) diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 3fa1575a6e36..0503380704de 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -757,6 +757,37 @@ void pstore_unregister(struct pstore_info *psi) } EXPORT_SYMBOL_GPL(pstore_unregister); +static void decompress_record(struct pstore_record *record) +{ + int unzipped_len; + + /* Only PSTORE_TYPE_DMESG support compression. */ + if (!record->compressed || record->type != PSTORE_TYPE_DMESG) { + pr_warn("ignored compressed record type %d\n", record->type); + return; + } + + /* No compression method has created the common buffer. */ + if (!big_oops_buf) { + pr_warn("no decompression buffer allocated\n"); + return; + } + + unzipped_len = pstore_decompress(record->buf, big_oops_buf, +record->size, big_oops_buf_sz); + if (unzipped_len > 0) { + if (record->ecc_notice_size) + memcpy(big_oops_buf + unzipped_len, + record->buf + record->size, + record->ecc_notice_size); + kfree(record->buf); + record->buf = big_oops_buf; + record->size = unzipped_len; + record->compressed = false; + } else + pr_err("decompression failed: %d\n", unzipped_len); +} + /* * Read all the records from the persistent store. Create * files in our filesystem. Don't warn about -EEXIST errors @@ -768,7 +799,6 @@ void pstore_get_records(int quiet) struct pstore_info *psi = psinfo; struct pstore_recordrecord = { .psi = psi, }; int failed = 0, rc; - int unzipped_len = -1; if (!psi) return; @@ -782,41 +812,18 @@ void pstore_get_records(int quiet) , , _notice_size, record.psi)) > 0) { - if (record.compressed && - record.type == PSTORE_TYPE_DMESG) { - if (big_oops_buf) - unzipped_len = pstore_decompress( - record.buf, - big_oops_buf, - record.size, - big_oops_buf_sz); - - if (unzipped_len > 0) { - if (record.ecc_notice_size) - memcpy(big_oops_buf + unzipped_len, - record.buf + recorrecord.size, - record.ecc_notice_size); - kfree(record.buf); - record.buf = big_oops_buf; - record.size = unzipped_len; - record.compressed = false; - } else { - pr_err("decompression failed;returned %d\n", - unzipped_len); - record.compressed = true; - } - } + + decompress_record(); rc = pstore_mkfile(record.type, psi->name, record.id, record.count, record.buf, record.compressed, record.size + record.ecc_notice_size, record.time, record.psi); - if (unzipped_len < 0) { - /* Free buffer other than big oops */ + + /* Free buffer other than big oops */ + if (record.buf != big_oops_buf) kfree(record.buf); - record.buf = NULL; - } else - unzipped_len = -1; + if (rc && (rc != -EEXIST || !quiet)) failed++; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/18] pstore: Replace arguments for write_buf_user() API
Removes argument list in favor of pstore record, though the user buffer remains passed separately since it must carry the __user annotation. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/platform.c | 35 --- fs/pstore/pmsg.c | 9 ++--- fs/pstore/ram.c| 14 +- include/linux/pstore.h | 23 +++ 4 files changed, 30 insertions(+), 51 deletions(-) diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 5eecf9012459..1e6642a2063e 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -639,47 +639,36 @@ static int pstore_write_compat(struct pstore_record *record) return record->psi->write_buf(record); } -static int pstore_write_buf_user_compat(enum pstore_type_id type, - enum kmsg_dump_reason reason, - u64 *id, unsigned int part, - const char __user *buf, - bool compressed, size_t size, - struct pstore_info *psi) +static int pstore_write_buf_user_compat(struct pstore_record *record, + const char __user *buf) { unsigned long flags = 0; - size_t i, bufsize = size; + size_t i, bufsize, total_size = record->size; long ret = 0; - if (unlikely(!access_ok(VERIFY_READ, buf, size))) + if (unlikely(!access_ok(VERIFY_READ, buf, total_size))) return -EFAULT; + bufsize = total_size; if (bufsize > psinfo->bufsize) bufsize = psinfo->bufsize; + record->buf = psinfo->buf; spin_lock_irqsave(>buf_lock, flags); - for (i = 0; i < size; ) { - struct pstore_record record = { - .type = type, - .reason = reason, - .id = id, - .part = part, - .buf = psinfo->buf, - .compressed = compressed, - .psi = psi, - }; - size_t c = min(size - i, bufsize); + for (i = 0; i < total_size; ) { + size_t c = min(total_size - i, bufsize); - ret = __copy_from_user(psinfo->buf, buf + i, c); + ret = __copy_from_user(record->buf, buf + i, c); if (unlikely(ret != 0)) { ret = -EFAULT; break; } - record.size = c; - ret = psi->write_buf(); + record->size = c; + ret = record->psi->write_buf(record); if (unlikely(ret < 0)) break; i += c; } spin_unlock_irqrestore(>buf_lock, flags); - return unlikely(ret < 0) ? ret : size; + return unlikely(ret < 0) ? ret : total_size; } /* diff --git a/fs/pstore/pmsg.c b/fs/pstore/pmsg.c index 78f6176c020f..ce35907602de 100644 --- a/fs/pstore/pmsg.c +++ b/fs/pstore/pmsg.c @@ -23,7 +23,11 @@ static DEFINE_MUTEX(pmsg_lock); static ssize_t write_pmsg(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { - u64 id; + struct pstore_record record = { + .type = PSTORE_TYPE_PMSG, + .size = count, + .psi = psinfo, + }; int ret; if (!count) @@ -34,8 +38,7 @@ static ssize_t write_pmsg(struct file *file, const char __user *buf, return -EFAULT; mutex_lock(_lock); - ret = psinfo->write_buf_user(PSTORE_TYPE_PMSG, 0, , 0, buf, 0, count, -psinfo); + ret = psinfo->write_buf_user(, buf); mutex_unlock(_lock); return ret ? ret : count; } diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index a7cdde60b1f9..d85e1adae1b6 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -451,19 +451,15 @@ static int notrace ramoops_pstore_write_buf(struct pstore_record *record) return 0; } -static int notrace ramoops_pstore_write_buf_user(enum pstore_type_id type, -enum kmsg_dump_reason reason, -u64 *id, unsigned int part, -const char __user *buf, -bool compressed, size_t size, -struct pstore_info *psi) +static int notrace ramoops_pstore_write_buf_user(struct pstore_record *record, +const char __user *buf) { - if (type == PSTORE_TYPE_PMSG) { - struct ramoops_context *cxt = psi->data; + if (record->type == PSTORE_TYPE_PMSG) { + struct ramoops_context *cxt = record-
[PATCH 16/18] pstore: Replace arguments for write_buf() API
As with the other API updates, this removes the long argument list in favor of passing a single pstore recaord. Signed-off-by: Kees Cook <keesc...@chromium.org> --- fs/pstore/ftrace.c | 9 +++-- fs/pstore/platform.c | 30 +- fs/pstore/ram.c| 44 ++-- include/linux/pstore.h | 21 + 4 files changed, 55 insertions(+), 49 deletions(-) diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c index 899d0ba0bd6c..a5506ec6995e 100644 --- a/fs/pstore/ftrace.c +++ b/fs/pstore/ftrace.c @@ -37,6 +37,12 @@ static void notrace pstore_ftrace_call(unsigned long ip, { unsigned long flags; struct pstore_ftrace_record rec = {}; + struct pstore_record record = { + .type = PSTORE_TYPE_FTRACE, + .buf = (char *), + .size = sizeof(rec), + .psi = psinfo, + }; if (unlikely(oops_in_progress)) return; @@ -47,8 +53,7 @@ static void notrace pstore_ftrace_call(unsigned long ip, rec.parent_ip = parent_ip; pstore_ftrace_write_timestamp(, pstore_ftrace_stamp++); pstore_ftrace_encode_cpu(, raw_smp_processor_id()); - psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *), - 0, sizeof(rec), psinfo); + psinfo->write_buf(); local_irq_restore(flags); } diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index aa3d6e572ede..5eecf9012459 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -587,8 +587,11 @@ static void pstore_console_write(struct console *con, const char *s, unsigned c) const char *e = s + c; while (s < e) { + struct pstore_record record = { + .type = PSTORE_TYPE_CONSOLE, + .psi = psinfo, + }; unsigned long flags; - u64 id; if (c > psinfo->bufsize) c = psinfo->bufsize; @@ -599,8 +602,9 @@ static void pstore_console_write(struct console *con, const char *s, unsigned c) } else { spin_lock_irqsave(>buf_lock, flags); } - psinfo->write_buf(PSTORE_TYPE_CONSOLE, 0, , 0, - s, 0, c, psinfo); + record.buf = (char *)s; + record.size = c; + psinfo->write_buf(); spin_unlock_irqrestore(>buf_lock, flags); s += c; c = e - s; @@ -630,10 +634,9 @@ static void pstore_unregister_console(void) {} static int pstore_write_compat(struct pstore_record *record) { - return record->psi->write_buf(record->type, record->reason, - >id, record->part, - psinfo->buf, record->compressed, - record->size, record->psi); + record->buf = psinfo->buf; + + return record->psi->write_buf(record); } static int pstore_write_buf_user_compat(enum pstore_type_id type, @@ -653,6 +656,15 @@ static int pstore_write_buf_user_compat(enum pstore_type_id type, bufsize = psinfo->bufsize; spin_lock_irqsave(>buf_lock, flags); for (i = 0; i < size; ) { + struct pstore_record record = { + .type = type, + .reason = reason, + .id = id, + .part = part, + .buf = psinfo->buf, + .compressed = compressed, + .psi = psi, + }; size_t c = min(size - i, bufsize); ret = __copy_from_user(psinfo->buf, buf + i, c); @@ -660,8 +672,8 @@ static int pstore_write_buf_user_compat(enum pstore_type_id type, ret = -EFAULT; break; } - ret = psi->write_buf(type, reason, id, part, psinfo->buf, -compressed, c, psi); + record.size = c; + ret = psi->write_buf(); if (unlikely(ret < 0)) break; i += c; diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index a18575fe32e9..a7cdde60b1f9 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -378,23 +378,18 @@ static size_t ramoops_write_kmsg_hdr(struct persistent_ram_zone *prz, return len; } -static int notrace ramoops_pstore_write_buf(enum pstore_type_id type, - enum kmsg_dump_reason reason, - u64 *id, unsigned int part, - const char *buf, - bool compress
[PATCH 10/18] pstore: Replace arguments for write() API
Similar to the pstore_info read() callback, there were too many arguments. This switches to the new struct pstore_record pointer instead. This adds "reason" and "part" to the record structure as well. Signed-off-by: Kees Cook <keesc...@chromium.org> --- arch/powerpc/kernel/nvram_64.c| 27 + drivers/acpi/apei/erst.c | 18 +--- drivers/firmware/efi/efi-pstore.c | 18 +--- fs/pstore/platform.c | 62 ++- include/linux/pstore.h| 36 +++ 5 files changed, 76 insertions(+), 85 deletions(-) diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c index 7f192001d09a..caf2e1f36d6b 100644 --- a/arch/powerpc/kernel/nvram_64.c +++ b/arch/powerpc/kernel/nvram_64.c @@ -389,51 +389,40 @@ static int nvram_pstore_open(struct pstore_info *psi) /** * nvram_pstore_write - pstore write callback for nvram - * @type: Type of message logged - * @reason: reason behind dump (oops/panic) - * @id: identifier to indicate the write performed - * @part: pstore writes data to registered buffer in parts, - * part number will indicate the same. - * @count: Indicates oops count - * @compressed: Flag to indicate the log is compressed - * @size: number of bytes written to the registered buffer - * @psi:registered pstore_info structure + * @record: pstore record to write, with @id to be set * * Called by pstore_dump() when an oops or panic report is logged in the * printk buffer. * Returns 0 on successful write. */ -static int nvram_pstore_write(enum pstore_type_id type, - enum kmsg_dump_reason reason, - u64 *id, unsigned int part, int count, - bool compressed, size_t size, - struct pstore_info *psi) +static int nvram_pstore_write(struct pstore_record *record) { int rc; unsigned int err_type = ERR_TYPE_KERNEL_PANIC; struct oops_log_info *oops_hdr = (struct oops_log_info *) oops_buf; /* part 1 has the recent messages from printk buffer */ - if (part > 1 || (type != PSTORE_TYPE_DMESG)) + if (record->part > 1 || (record->type != PSTORE_TYPE_DMESG)) return -1; if (clobbering_unread_rtas_event()) return -1; oops_hdr->version = cpu_to_be16(OOPS_HDR_VERSION); - oops_hdr->report_length = cpu_to_be16(size); + oops_hdr->report_length = cpu_to_be16(record->size); oops_hdr->timestamp = cpu_to_be64(ktime_get_real_seconds()); - if (compressed) + if (record->compressed) err_type = ERR_TYPE_KERNEL_PANIC_GZ; rc = nvram_write_os_partition(_log_partition, oops_buf, - (int) (sizeof(*oops_hdr) + size), err_type, count); + (int) (sizeof(*oops_hdr) + record->size), err_type, + record->count); if (rc != 0) return rc; - *id = part; + record->id = record->part; return 0; } diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c index bbefb7522f80..440588d189e7 100644 --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -926,9 +926,7 @@ static int erst_check_table(struct acpi_table_erst *erst_tab) static int erst_open_pstore(struct pstore_info *psi); static int erst_close_pstore(struct pstore_info *psi); static ssize_t erst_reader(struct pstore_record *record); -static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason, - u64 *id, unsigned int part, int count, bool compressed, - size_t size, struct pstore_info *psi); +static int erst_writer(struct pstore_record *record); static int erst_clearer(enum pstore_type_id type, u64 id, int count, struct timespec time, struct pstore_info *psi); @@ -1054,9 +1052,7 @@ static ssize_t erst_reader(struct pstore_record *record) return (rc < 0) ? rc : (len - sizeof(*rcd)); } -static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason, - u64 *id, unsigned int part, int count, bool compressed, - size_t size, struct pstore_info *psi) +static int erst_writer(struct pstore_record *record) { struct cper_pstore_record *rcd = (struct cper_pstore_record *) (erst_info.buf - sizeof(*rcd)); @@ -1071,21 +1067,21 @@ static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason, /* timestamp valid. platform_id, partition_id are invalid */ rcd->hdr.validation_bits = CPER_VALID_TIMESTAMP; rcd->hdr.timestamp = get_seconds(); - rcd->
Re: [PATCH 1/2] x86: Fix kernel panic when booting with XD disabled in uEFI firmware
On Tue, Dec 8, 2015 at 6:19 AM, Borislav Petkov <b...@alien8.de> wrote: > On Tue, Dec 08, 2015 at 12:25:57PM +, Matt Fleming wrote: >> On Mon, 07 Dec, at 11:10:43PM, Kosuke Tatsukawa wrote: >> > >> > Thank you pointing that out. >> > >> > linux-4.4-rc3 booted without a problem on a real server even with XD >> > turned off by the firmware. I didn't notice this before because I was > > The aforementioned patch reenables NX. > >> Borislav, what do you think about stripping PAGE_NX from 'page_flags' >> in kernel_map_pages_in_pgd() if NX isn't supported, rather than >> returning EINVAL? At least that way EFI runtime services would still >> work. > > I guess we can - I mean, I don't see what can go wrong more when > allowing the kernel to execute even NX UEFI regions. Maybe easier > generation of "gadgets" in the ROP sense ... > > On a related node, I'm very sceptical of the existence of this "noexec" > chicken bit, if you ask me. It is a really bad idea, security-wise, to > disable NX. Is there even a valid use case to disable NX? > > Because if not, I'd vote for removing that chicken bit or at least > taining the kernel with > > add_taint(TAINT_USER_MORON, ... ); If we add this for not-nx, I would like to add it for not-rodata too. > Kees, has this NX disabling practice come up in the past, per chance... ? I've never seen anyone actually use it. I was asked to include it out of fear of some kind of rogue imagined CPU configuration that mixed NX and non-NX capable CPUs in a single machine where the forced NX re-enablement code would cause problems. As you might imagine, I'm not aware of this case ever being an issue. ;) -Kees -- Kees Cook Chrome OS & Brillo Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] efi-pstore: implement efivars_pstore_exit()
On Wed, Nov 11, 2015 at 3:23 PM, Luck, Tony <tony.l...@intel.com> wrote: >>>> module_init(efivars_pstore_init); >>> >>> Looks OK to me. Kees, are you picking this up? >> >> I can, though usually it goes through Tony. > > Can I count that as "Acked-by" from both of you? Yup, sorry. I thought I'd acked them already. :) My bad! Acked-by: Kees Cook <keesc...@chromium.org> -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] efi-pstore: implement efivars_pstore_exit()
On Wed, Nov 11, 2015 at 8:59 AM, Matt Fleming <m...@codeblueprint.co.uk> wrote: > On Sat, 07 Nov, at 12:43:48PM, Geliang Tang wrote: >> The original efivars_pstore_exit() is empty. I >> 1) add a bufsize check statement. >> 2) call pstore_unregister as it is defined now. >> 3) free the memory and set bufsize to 0. >> >> Signed-off-by: Geliang Tang <geliangt...@163.com> >> --- >> drivers/firmware/efi/efi-pstore.c | 7 +++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/drivers/firmware/efi/efi-pstore.c >> b/drivers/firmware/efi/efi-pstore.c >> index eac76a7..62d57d8 100644 >> --- a/drivers/firmware/efi/efi-pstore.c >> +++ b/drivers/firmware/efi/efi-pstore.c >> @@ -393,6 +393,13 @@ static __init int efivars_pstore_init(void) >> >> static __exit void efivars_pstore_exit(void) >> { >> + if (!efi_pstore_info.bufsize) >> + return; >> + >> + pstore_unregister(_pstore_info); >> + kfree(efi_pstore_info.buf); >> + efi_pstore_info.buf = NULL; >> + efi_pstore_info.bufsize = 0; >> } >> >> module_init(efivars_pstore_init); > > Looks OK to me. Kees, are you picking this up? I can, though usually it goes through Tony. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] x86/mm changes for v4.4
On Mon, Nov 9, 2015 at 11:08 PM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 9 November 2015 at 22:08, Kees Cook <keesc...@chromium.org> wrote: >> On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel >> <ard.biesheu...@linaro.org> wrote: >>> On 8 November 2015 at 07:58, Kees Cook <keesc...@chromium.org> wrote: >>>> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel >>>> <ard.biesheu...@linaro.org> wrote: >>>>> On 7 November 2015 at 08:09, Ingo Molnar <mi...@kernel.org> wrote: >>>>>> >>>>>> * Matt Fleming <m...@codeblueprint.co.uk> wrote: >>>>>> >>>>>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >>>>>>> > >>>>>>> > 3) We should fix the EFI permission problem without relying on the >>>>>>> > firmware: it >>>>>>> > appears we could just mark everything R-X optimistically, and if >>>>>>> > a write fault >>>>>>> > happens (it's pretty rare in fact, only triggers when we write to >>>>>>> > an EFI >>>>>>> > variable and so), we can mark the faulting page RW- on the fly, >>>>>>> > because it >>>>>>> > appears that writable EFI sections, while not enumerated very >>>>>>> > well in 'old' >>>>>>> > firmware, are still supposed to be page granular. (Even 'new' >>>>>>> > firmware I >>>>>>> > wouldn't automatically trust to get the enumeration right...) >>>>>>> >>>>>>> Sorry, this isn't true. I misled you with one of my earlier posts on >>>>>>> this topic. Let me try and clear things up... >>>>>>> >>>>>>> Writing to EFI regions has to do with every invocation of the EFI >>>>>>> runtime services - it's not limited to when you read/write/delete EFI >>>>>>> variables. In fact, EFI variables really have nothing to do with this >>>>>>> discussion, they're a completely opaque concept to the OS, we have no >>>>>>> idea how the firmware implements them. Everything is done via the EFI >>>>>>> boot/runtime services. >>>>>>> >>>>>>> The firmware itself will attempt to write to EFI regions when we >>>>>>> invoke the EFI services because that's where the PE/COFF ".data" and >>>>>>> ".bss" sections live along with the heap. There's even some relocation >>>>>>> fixups that occur as SetVirtualAddressMap() time so it'll write to >>>>>>> ".text" too. >>>>>>> >>>>>>> Now, the above PE/COFF sections are usually (always?) contained within >>>>>>> EFI regions of type EfiRuntimeServicesCode. We know this is true >>>>>>> because the firmware folks have told us so, and because stopping that >>>>>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >>>>>>> V2.5. >>>>>>> >>>>>>> The data sections within the region are also *not* guaranteed to be >>>>>>> page granular because work was required in Tianocore for emitting >>>>>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >>>>>>> support. >>>>>>> >>>>>>> Ultimately, what this means is that if you were to attempt to >>>>>>> dynamically fixup those regions that required write permission, you'd >>>>>>> have to modify the mappings for the majority of the EFI regions >>>>>>> anyway. And if you're blindly allowing write permission as a fixup, >>>>>>> there's not much security to be had. >>>>>> >>>>>> I think you misunderstood my suggestion: the 'fixup' would be changing >>>>>> it from R-X >>>>>> to RW-, i.e. it would add 'write' permission but remove 'execute' >>>>>> permission. >>>>>> >>>>>> Note that there would be no 'RWX' permission at any given moment - which >>>>>> is the >>>>>> dangerous combination. >>>>>> >>>>> >>>>> The problem with that is that /any/ page in the UEFI runtime region &g
Re: [GIT PULL] x86/mm changes for v4.4
On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 8 November 2015 at 07:58, Kees Cook <keesc...@chromium.org> wrote: >> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel >> <ard.biesheu...@linaro.org> wrote: >>> On 7 November 2015 at 08:09, Ingo Molnar <mi...@kernel.org> wrote: >>>> >>>> * Matt Fleming <m...@codeblueprint.co.uk> wrote: >>>> >>>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >>>>> > >>>>> > 3) We should fix the EFI permission problem without relying on the >>>>> > firmware: it >>>>> > appears we could just mark everything R-X optimistically, and if a >>>>> > write fault >>>>> > happens (it's pretty rare in fact, only triggers when we write to >>>>> > an EFI >>>>> > variable and so), we can mark the faulting page RW- on the fly, >>>>> > because it >>>>> > appears that writable EFI sections, while not enumerated very well >>>>> > in 'old' >>>>> > firmware, are still supposed to be page granular. (Even 'new' >>>>> > firmware I >>>>> > wouldn't automatically trust to get the enumeration right...) >>>>> >>>>> Sorry, this isn't true. I misled you with one of my earlier posts on >>>>> this topic. Let me try and clear things up... >>>>> >>>>> Writing to EFI regions has to do with every invocation of the EFI >>>>> runtime services - it's not limited to when you read/write/delete EFI >>>>> variables. In fact, EFI variables really have nothing to do with this >>>>> discussion, they're a completely opaque concept to the OS, we have no >>>>> idea how the firmware implements them. Everything is done via the EFI >>>>> boot/runtime services. >>>>> >>>>> The firmware itself will attempt to write to EFI regions when we >>>>> invoke the EFI services because that's where the PE/COFF ".data" and >>>>> ".bss" sections live along with the heap. There's even some relocation >>>>> fixups that occur as SetVirtualAddressMap() time so it'll write to >>>>> ".text" too. >>>>> >>>>> Now, the above PE/COFF sections are usually (always?) contained within >>>>> EFI regions of type EfiRuntimeServicesCode. We know this is true >>>>> because the firmware folks have told us so, and because stopping that >>>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >>>>> V2.5. >>>>> >>>>> The data sections within the region are also *not* guaranteed to be >>>>> page granular because work was required in Tianocore for emitting >>>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >>>>> support. >>>>> >>>>> Ultimately, what this means is that if you were to attempt to >>>>> dynamically fixup those regions that required write permission, you'd >>>>> have to modify the mappings for the majority of the EFI regions >>>>> anyway. And if you're blindly allowing write permission as a fixup, >>>>> there's not much security to be had. >>>> >>>> I think you misunderstood my suggestion: the 'fixup' would be changing it >>>> from R-X >>>> to RW-, i.e. it would add 'write' permission but remove 'execute' >>>> permission. >>>> >>>> Note that there would be no 'RWX' permission at any given moment - which >>>> is the >>>> dangerous combination. >>>> >>> >>> The problem with that is that /any/ page in the UEFI runtime region >>> may intersect with both .text and .data of any of the PE/COFF images >>> that make up the runtime firmware (since the PE/COFF sections are not >>> necessarily page aligned). Such pages require RWX permissions. The >>> UEFI memory map does not provide the information to identify those >>> pages a priori (the entire region containing several PE/COFF images >>> could be covered by a single entry) so it is hard to guess which pages >>> should be allowed these RWX permissions. >> >> I'm sad that UEFI was designed without even the most basic of memory >> protections in mind. UEFI _itself_ should be setting up protective >> page mappings. :
Re: [PATCH 1/2] efi-pstore: fix kernel-doc argument name
On Sat, Oct 31, 2015 at 8:23 AM, Geliang Tang <geliangt...@163.com> wrote: > The first argument name in the kernel-doc argument list for > efi_pstore_scan_sysfs_enter() was slightly off. Fix it for the > kernel doc. > > Signed-off-by: Geliang Tang <geliangt...@163.com> Acked-by: Kees Cook <keesc...@chromium.org> Thanks! -Kees > --- > drivers/firmware/efi/efi-pstore.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/firmware/efi/efi-pstore.c > b/drivers/firmware/efi/efi-pstore.c > index c8d794c..eac76a7 100644 > --- a/drivers/firmware/efi/efi-pstore.c > +++ b/drivers/firmware/efi/efi-pstore.c > @@ -103,7 +103,7 @@ static int efi_pstore_read_func(struct efivar_entry > *entry, void *data) > > /** > * efi_pstore_scan_sysfs_enter > - * @entry: scanning entry > + * @pos: scanning entry > * @next: next entry > * @head: list head > */ > -- > 2.4.3 > > -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
On Tue, Mar 10, 2015 at 1:00 AM, Borislav Petkov b...@suse.de wrote: Final patch: --- From: Yinghai Lu ying...@kernel.org Date: Sat, 7 Mar 2015 14:07:16 -0800 Subject: [PATCH] x86/setup: Move compressed kernel to the end of the buffer Boris found that passing KASLR status through setup_data from the boot stage cannot be used later in the kernel stage, see commit f47233c2d34f (x86/mm/ASLR: Propagate base load address calculation) Here's some background: The boot loader allocates a buffer of size init_size in concordance with the value passed in the setup header and it loads the compressed, i.e. first kernel (arch/x86/boot/compressed/vmlinux) in it. First kernel then moves itself somewhere around the middle of the buffer at z_extract_offset to make sure that the decompressor does not overwrite input data. After the decompressor is finished, kernel proper (vmlinux) uses the whole buffer from the beginning and the compressed kernel's code and data section is overlapped with the kernel proper's bss section. Later on, clear_bss() in kernel proper clears .bss before code in arch/x86/kernel/setup.c can access setup_data passed in the first, compressed kernel. To make sure that data survives, we should avoid the overlapping. As a first step, move the first kernel closer to the end of the buffer instead of the middle. As a result, this will place first kernel's data area out of kernel proper's .bss area. This way we can find out where the data section of the copied first kernel is instead of guessing. In addition, it will make the KASLR mem_avoid array preparation for the search of a fitting buffer much simpler. While at it, rename z_extract_offset to z_min_extract_offset as it is actually the minimum extract offset now. In order to keep the final extract offset page-aligned we need to make both kernels' _end markers page-aligned too so that init_size is page-aligned as a result. Signed-off-by: Yinghai Lu ying...@kernel.org Cc: H. Peter Anvin h...@zytor.com Cc: Matt Fleming matt.flem...@intel.com Cc: Kees Cook keesc...@chromium.org Cc: Thomas Gleixner t...@linutronix.de Cc: Jiri Kosina jkos...@suse.cz Cc: linux-efi@vger.kernel.org Cc: Ingo Molnar mi...@redhat.com Cc: Baoquan He b...@redhat.com Fixes: f47233c2d34f (x86/mm/ASLR: Propagate base load address calculation) Link: http://lkml.kernel.org/r/1425766041-6551-3-git-send-email-ying...@kernel.org [ Commit message massively rewritten ] Acked-by: Kees Cook keesc...@chromium.org Thanks! -Kees Signed-off-by: --- arch/x86/boot/compressed/head_32.S | 11 +-- arch/x86/boot/compressed/head_64.S | 8 ++-- arch/x86/boot/compressed/mkpiggy.c | 7 ++- arch/x86/boot/compressed/vmlinux.lds.S | 1 + arch/x86/boot/header.S | 2 +- arch/x86/kernel/asm-offsets.c | 1 + arch/x86/kernel/vmlinux.lds.S | 1 + 7 files changed, 21 insertions(+), 10 deletions(-) diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S index cbed1407a5cd..a9b56f1d8e75 100644 --- a/arch/x86/boot/compressed/head_32.S +++ b/arch/x86/boot/compressed/head_32.S @@ -147,7 +147,9 @@ preferred_addr: 1: /* Target address to relocate to for decompression */ - addl$z_extract_offset, %ebx + movlBP_init_size(%esi), %eax + subl$_end, %eax + addl%eax, %ebx /* Set up the stack */ lealboot_stack_end(%ebx), %esp @@ -208,8 +210,13 @@ relocated: */ /* push arguments for decompress_kernel: */ pushl $z_output_len /* decompressed length */ - lealz_extract_offset_negative(%ebx), %ebp + + movlBP_init_size(%esi), %eax + subl$_end, %eax + movl%ebx, %ebp + subl%eax, %ebp pushl %ebp/* output address */ + pushl $z_input_len/* input_len */ lealinput_data(%ebx), %eax pushl %eax/* input_data */ diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index 2884e0c3e8a5..69015b576cf6 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -101,7 +101,9 @@ ENTRY(startup_32) 1: /* Target address to relocate to for decompression */ - addl$z_extract_offset, %ebx + movlBP_init_size(%esi), %eax + subl$_end, %eax + addl%eax, %ebx /* * Prepare for entering 64 bit mode @@ -329,7 +331,9 @@ preferred_addr: 1: /* Target address to relocate to for decompression */ - leaqz_extract_offset(%rbp), %rbx + movlBP_init_size(%rsi), %ebx + subl$_end, %ebx + addq%rbp, %rbx /* Set up the stack */ leaqboot_stack_end(%rbx), %rsp diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c index
Re: [PATCH v3 0/7] x86, boot: clean up kasl
On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu ying...@kernel.org wrote: First 3 patches make ZO (arch/x86/boot/compressed/vmlinux) data region is not overwritten by VO (vmlinux) after decompress. So could pass data from ZO to VO. Tiny nit-pick, in case there is a v5: the Subject on this (0/7) says kasl instead of kaslr. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu ying...@kernel.org wrote: commit e6023367d779 (x86, kaslr: Prevent .bss from overlaping initrd) introduced one run_size for kaslr. We should use real runtime size (include copy/decompress) aka init_size. run_size is VO (vmlinux) init size include bss and brk. init_size is the size needed for decompress and it is bigger than run_size when decompress need more buff. According to arch/x86/boot/header.S: | #define ZO_INIT_SIZE(ZO__end - ZO_startup_32 + ZO_z_extract_offset) | #define VO_INIT_SIZE(VO__end - VO__text) | #if ZO_INIT_SIZE VO_INIT_SIZE | #define INIT_SIZE ZO_INIT_SIZE | #else | #define INIT_SIZE VO_INIT_SIZE | #endif | init_size: .long INIT_SIZE # kernel initialization size Bootloader allocate buffer according to init_size in hdr, and load the ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer. init_size first come from VO (vmlinux) init size. That VO init size is from VO _end to VO _end and include VO bss and brk area. During running of ZO, ZO move itself to the middle of buffer at z_extract_offset to make sure that decompressor would not have output overwrite input data before input data get consumed. But z_extract_offset calculating is based on size of VO (vmlinux) and size of compressed VO only at first. So need to make sure [z_extra_offset, init_size) will fit ZO, that means init_size need to be adjusted according to ZO size. That make init_size is always = run_size. During aslr buffer searching, we need to make sure the new buffer is big enough for decompress at first. So use init_size instead, and kill not needed run_size related code. Fixes: e6023367d779 (x86, kaslr: Prevent .bss from overlaping initrd) Cc: H. Peter Anvin h...@zytor.com Cc: Josh Triplett j...@joshtriplett.org Cc: Matt Fleming matt.flem...@intel.com Cc: Kees Cook keesc...@chromium.org Cc: Andrew Morton a...@linux-foundation.org Cc: Ard Biesheuvel ard.biesheu...@linaro.org Cc: Junjie Mao eternal@gmail.com Signed-off-by: Yinghai Lu ying...@kernel.org Acked-by: Kees Cook keesc...@chromium.org --- arch/x86/boot/compressed/Makefile | 4 +--- arch/x86/boot/compressed/head_32.S | 5 ++--- arch/x86/boot/compressed/head_64.S | 5 + arch/x86/boot/compressed/misc.c| 15 +++--- arch/x86/boot/compressed/mkpiggy.c | 9 ++-- arch/x86/tools/calc_run_size.sh| 42 -- 6 files changed, 13 insertions(+), 67 deletions(-) delete mode 100644 arch/x86/tools/calc_run_size.sh diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 0a291cd..70cc92c 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -92,10 +92,8 @@ suffix-$(CONFIG_KERNEL_XZ) := xz suffix-$(CONFIG_KERNEL_LZO):= lzo suffix-$(CONFIG_KERNEL_LZ4):= lz4 -RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \ -$(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh) quiet_cmd_mkpiggy = MKPIGGY $@ - cmd_mkpiggy = $(obj)/mkpiggy $ $(RUN_SIZE) $@ || ( rm -f $@ ; false ) + cmd_mkpiggy = $(obj)/mkpiggy $ $@ || ( rm -f $@ ; false ) targets += piggy.S $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S index 1d7fbbc..cbed140 100644 --- a/arch/x86/boot/compressed/head_32.S +++ b/arch/x86/boot/compressed/head_32.S @@ -207,8 +207,7 @@ relocated: * Do the decompression, and jump to the new kernel.. */ /* push arguments for decompress_kernel: */ - pushl $z_run_size /* size of kernel with .bss and .brk */ - pushl $z_output_len /* decompressed length, end of relocs */ + pushl $z_output_len /* decompressed length */ lealz_extract_offset_negative(%ebx), %ebp pushl %ebp/* output address */ pushl $z_input_len/* input_len */ @@ -218,7 +217,7 @@ relocated: pushl %eax/* heap area */ pushl %esi/* real mode pointer */ calldecompress_kernel /* returns kernel location in %eax */ - addl$28, %esp + addl$24, %esp /* * Jump to the decompressed kernel. diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index 6b1766c..2884e0c 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -402,16 +402,13 @@ relocated: * Do the decompression, and jump to the new kernel.. */ pushq %rsi/* Save the real mode argument */ - movq$z_run_size, %r9/* size of kernel with .bss and .brk */ - pushq %r9 movq%rsi, %rdi /* real mode address */ leaqboot_heap(%rip), %rsi /* malloc area for uncompression */ leaq
Re: [PATCH v3 6/7] x86, boot: Split kernel_ident_mapping_init to another file
On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu ying...@kernel.org wrote: We need to include that in boot::decompress_kernel stage to set new ident mapping. Also add checking for __pa/__va macro definition, as we need to override them in boot::decompress_kernel stage. Signed-off-by: Yinghai Lu ying...@kernel.org This seems good. :) Reviewed-by: Kees Cook keesc...@chromium.org -Kees --- arch/x86/include/asm/page.h | 5 +++ arch/x86/mm/ident_map.c | 74 + arch/x86/mm/init_64.c | 74 + 3 files changed, 80 insertions(+), 73 deletions(-) create mode 100644 arch/x86/mm/ident_map.c diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 802dde3..cf8f619 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -37,7 +37,10 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr) #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE +#ifndef __pa #define __pa(x)__phys_addr((unsigned long)(x)) +#endif + #define __pa_nodebug(x)__phys_addr_nodebug((unsigned long)(x)) /* __pa_symbol should be used for C visible symbols. This seems to be the official gcc blessed way to do such arithmetic. */ @@ -51,7 +54,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, #define __pa_symbol(x) \ __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x))) +#ifndef __va #define __va(x)((void *)((unsigned long)(x)+PAGE_OFFSET)) +#endif #define __boot_va(x) __va(x) #define __boot_pa(x) __pa(x) diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c new file mode 100644 index 000..751ca92 --- /dev/null +++ b/arch/x86/mm/ident_map.c @@ -0,0 +1,74 @@ + +static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page, + unsigned long addr, unsigned long end) +{ + addr = PMD_MASK; + for (; addr end; addr += PMD_SIZE) { + pmd_t *pmd = pmd_page + pmd_index(addr); + + if (!pmd_present(*pmd)) + set_pmd(pmd, __pmd(addr | pmd_flag)); + } +} +static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page, + unsigned long addr, unsigned long end) +{ + unsigned long next; + + for (; addr end; addr = next) { + pud_t *pud = pud_page + pud_index(addr); + pmd_t *pmd; + + next = (addr PUD_MASK) + PUD_SIZE; + if (next end) + next = end; + + if (pud_present(*pud)) { + pmd = pmd_offset(pud, 0); + ident_pmd_init(info-pmd_flag, pmd, addr, next); + continue; + } + pmd = (pmd_t *)info-alloc_pgt_page(info-context); + if (!pmd) + return -ENOMEM; + ident_pmd_init(info-pmd_flag, pmd, addr, next); + set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE)); + } + + return 0; +} + +int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page, + unsigned long addr, unsigned long end) +{ + unsigned long next; + int result; + int off = info-kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0; + + for (; addr end; addr = next) { + pgd_t *pgd = pgd_page + pgd_index(addr) + off; + pud_t *pud; + + next = (addr PGDIR_MASK) + PGDIR_SIZE; + if (next end) + next = end; + + if (pgd_present(*pgd)) { + pud = pud_offset(pgd, 0); + result = ident_pud_init(info, pud, addr, next); + if (result) + return result; + continue; + } + + pud = (pud_t *)info-alloc_pgt_page(info-context); + if (!pud) + return -ENOMEM; + result = ident_pud_init(info, pud, addr, next); + if (result) + return result; + set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE)); + } + + return 0; +} diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 30eb05a..c30efb6 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -56,79 +56,7 @@ #include mm_internal.h -static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page, - unsigned long addr, unsigned long end) -{ - addr = PMD_MASK; - for (; addr end; addr += PMD_SIZE) { - pmd_t *pmd = pmd_page + pmd_index
Re: [PATCH v3 4/7] x86, kaslr: Access the correct kaslr_enabled variable
On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu ying...@kernel.org wrote: Commit f47233c2d34f (x86/mm/ASLR: Propagate base load address calculation) started passing KASLR status to kernel proper, but it uses a physical address as the vaule, leading to parsing bogus KASLR status in kernel proper. The setup_data linked list and thus the element which contains kaslr_enabled is chained together using physical addresses. At the time when we access it in the kernel proper, we're already running with paging enabled and therefore must access it through its virtual address. This patch changes the code to use early_memmap() and access the value. -v3: add checking return from early_memmap according to Boris. -v4: add description about setup_data accessing from Boris to change log. Nit for next version: revision change notes usually go below the --- in a separate section, and end with ---, above the files changed report, and below the main changelog. -Kees Fixes: f47233c2d34f (x86/mm/ASLR: Propagate base load address calculation) Cc: Matt Fleming matt.flem...@intel.com Cc: Borislav Petkov b...@suse.de Cc: Kees Cook keesc...@chromium.org Cc: Jiri Kosina jkos...@suse.cz Acked-by: Jiri Kosina jkos...@suse.cz Signed-off-by: Yinghai Lu ying...@kernel.org --- arch/x86/kernel/setup.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 98dc931..912f124 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -429,7 +429,18 @@ static void __init reserve_initrd(void) static void __init parse_kaslr_setup(u64 pa_data, u32 data_len) { - kaslr_enabled = (bool)(pa_data + sizeof(struct setup_data)); + /* kaslr_setup_data is defined in aslr.c */ + unsigned char *data; + unsigned long offset = sizeof(struct setup_data); + + data = early_memremap(pa_data, offset + 1); + if (!data) { + kaslr_enabled = true; + return; + } + + kaslr_enabled = *(data + offset); + early_memunmap(data, offset + 1); } static void __init parse_setup_data(void) -- 1.8.4.5 -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling
On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu ying...@kernel.org wrote: Now ZO sit end of the buffer, we can find out where is ZO text and data/bss etc. [input, input+input_size) is copied compressed kernel, not the whole ZO. [output, output+init_size) is the buffer for VO. [input+input_size, output+init_size) is [_text, _end) for ZO. that could be first range in mem_avoid. We don't need to guess that anymore. That area already include heap and stack for ZO running. So we don't need to put them into mem_avoid array. We need to put boot_params into the mem_avoid too. As with 64bit bootloader could put it anywhere. This may be a stupid question, but are boot_params being used outside of the compressed loader? If so, it might make sense to split that change into a separate patch to go to stable, if it's causing problems. (And document what problem is getting solved.) -Kees Also change output_size referring to init_size, as we pass init_size instead. Cc: Kees Cook keesc...@chromium.org Signed-off-by: Yinghai Lu ying...@kernel.org --- arch/x86/boot/compressed/aslr.c | 29 ++--- arch/x86/boot/compressed/misc.h | 4 ++-- 2 files changed, 16 insertions(+), 17 deletions(-) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index 7083c16..a279514 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -116,7 +116,7 @@ struct mem_vector { unsigned long size; }; -#define MEM_AVOID_MAX 5 +#define MEM_AVOID_MAX 4 static struct mem_vector mem_avoid[MEM_AVOID_MAX]; static bool mem_contains(struct mem_vector *region, struct mem_vector *item) @@ -142,7 +142,7 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two) } static void mem_avoid_init(unsigned long input, unsigned long input_size, - unsigned long output, unsigned long output_size) + unsigned long output, unsigned long init_size) { u64 initrd_start, initrd_size; u64 cmd_line, cmd_line_size; @@ -151,10 +151,13 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, /* * Avoid the region that is unsafe to overlap during -* decompression (see calculations at top of misc.c). +* decompression. +* As we already move ZO (arch/x86/boot/compressed/vmlinux) +* to the end of buffer, [input+input_size, output+init_size) +* has [_text, _end) for ZO. */ - unsafe_len = (output_size 12) + 32768 + 18; - unsafe = (unsigned long)input + input_size - unsafe_len; + unsafe_len = output + init_size - (input + input_size); + unsafe = (unsigned long)input + input_size; mem_avoid[0].start = unsafe; mem_avoid[0].size = unsafe_len; @@ -176,13 +179,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, mem_avoid[2].start = cmd_line; mem_avoid[2].size = cmd_line_size; - /* Avoid heap memory. */ - mem_avoid[3].start = (unsigned long)free_mem_ptr; - mem_avoid[3].size = BOOT_HEAP_SIZE; - - /* Avoid stack memory. */ - mem_avoid[4].start = (unsigned long)free_mem_end_ptr; - mem_avoid[4].size = BOOT_STACK_SIZE; + /* Avoid params */ + mem_avoid[3].start = (unsigned long)real_mode; + mem_avoid[3].size = sizeof(*real_mode); } /* Does this memory vector overlap a known avoided area? */ @@ -327,7 +326,7 @@ unsigned char *choose_kernel_location(struct boot_params *params, unsigned char *input, unsigned long input_size, unsigned char *output, - unsigned long output_size) + unsigned long init_size) { unsigned long choice = (unsigned long)output; unsigned long random; @@ -349,10 +348,10 @@ unsigned char *choose_kernel_location(struct boot_params *params, /* Record the various known unsafe memory ranges. */ mem_avoid_init((unsigned long)input, input_size, - (unsigned long)output, output_size); + (unsigned long)output, init_size); /* Walk e820 and find a random address. */ - random = find_random_addr(choice, output_size); + random = find_random_addr(choice, init_size); if (!random) { debug_putstr(KASLR could not find suitable E820 region...\n); goto out; diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h index ee3576b..23156e7 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -61,7 +61,7 @@ unsigned char *choose_kernel_location(struct boot_params *params, unsigned char *input
Re: [PATCH v3 7/7] x86, kaslr, 64bit: Set new or extra ident_mapping
On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu ying...@kernel.org wrote: First, aslr will support to put random VO above 4G, so we must set ident mapping for the range even we come from startup_32 path. Second, when boot from 64bit bootloader, bootloader set ident mapping, and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64. Those pages for pagetable need to be avoided when we select new random VO (vmlinux) base. Otherwise decompressor would overwrite them during decompressing. One solution: go through pagetable and find out every page is used by pagetable for every mem_aovid checking but we will need extra code. Other solution: create new ident mapping instead, and pages for pagetable will sit in _pagetable section of ZO, and they are in mem_avoid array already. In this way, we can reuse the code for setting ident mapping. The _pgtable will be shared 32bit and 64bit path to reduce init_size, as now ZO _rodata to _end will contribute init_size. Need to increase pgt buffer size. When boot via startup_64, as we need to cover old VO, params, cmdline and new VO, in extreme case we could have them all cross 512G boundary, will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga ram. Plus one for level4. Total will be 19 pages. When boot via startup_32, aslr would move new VO above 4G, we need set extra ident mapping for new VO, pgt buffer come from _pgtable offset 6 pages. should only need (2+2) pages at most when it cross 512G boundary. So 19 pages could make both pathes happy. -v3: add mapping for first 2M with video ram when X86_VERBOSE_BOOTUP is set. Don't need to set mapping for setup_data, as it is already late in boot/ZO stage, will not access it until VO stage, and VO stage will use early_memmap or kernel address to access them. Cc: Kees Cook keesc...@chromium.org Cc: Jiri Kosina jkos...@suse.cz Cc: Borislav Petkov b...@suse.de Cc: Matt Fleming matt.flem...@intel.com Signed-off-by: Yinghai Lu ying...@kernel.org --- arch/x86/boot/compressed/aslr.c | 21 arch/x86/boot/compressed/head_64.S | 4 +- arch/x86/boot/compressed/misc_pgt.c | 98 + arch/x86/include/asm/boot.h | 19 +++ 4 files changed, 140 insertions(+), 2 deletions(-) create mode 100644 arch/x86/boot/compressed/misc_pgt.c diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index a279514..34eb652 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -1,3 +1,8 @@ +#ifdef CONFIG_X86_64 +#define __pa(x) ((unsigned long)(x)) +#define __va(x) ((void *)((unsigned long)(x))) +#endif + #include misc.h #include asm/msr.h @@ -21,6 +26,8 @@ struct kaslr_setup_data { __u8 data[1]; } kaslr_setup_data; +#include misc_pgt.c Shouldn't this just be a normal built .o file that is linked together in the Makefile, specifically tracking CONFIG_RANDOMIZE_BASE as aslr.o already is? -Kees + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -160,6 +167,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, unsafe = (unsigned long)input + input_size; mem_avoid[0].start = unsafe; mem_avoid[0].size = unsafe_len; + fill_pagetable(output, init_size); /* Avoid initrd. */ initrd_start = (u64)real_mode-ext_ramdisk_image 32; @@ -168,6 +176,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, initrd_size |= real_mode-hdr.ramdisk_size; mem_avoid[1].start = initrd_start; mem_avoid[1].size = initrd_size; + /* don't need to set mapping for initrd */ /* Avoid kernel command line. */ cmd_line = (u64)real_mode-ext_cmd_line_ptr 32; @@ -178,10 +187,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, ; mem_avoid[2].start = cmd_line; mem_avoid[2].size = cmd_line_size; + fill_pagetable(cmd_line, cmd_line_size); /* Avoid params */ mem_avoid[3].start = (unsigned long)real_mode; mem_avoid[3].size = sizeof(*real_mode); + fill_pagetable((unsigned long)real_mode, sizeof(*real_mode)); + + /* don't need to set mapping for setup_data */ + +#ifdef CONFIG_X86_VERBOSE_BOOTUP + /* for video ram */ + fill_pagetable(0, PMD_SIZE); +#endif } /* Does this memory vector overlap a known avoided area? */ @@ -362,6 +380,9 @@ unsigned char *choose_kernel_location(struct boot_params *params, goto out; choice = random; + + fill_pagetable(choice, init_size); + switch_pagetable(); out: return (unsigned char *)choice; } diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index 69015b5..1b6e34a 100644 --- a/arch/x86/boot
Re: [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling
On Mon, Mar 9, 2015 at 6:10 PM, Yinghai Lu ying...@kernel.org wrote: On Mon, Mar 9, 2015 at 6:00 PM, Kees Cook keesc...@chromium.org wrote: On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu ying...@kernel.org wrote: This may be a stupid question, but are boot_params being used outside of the compressed loader? If so, it might make sense to split that change into a separate patch to go to stable, if it's causing problems. (And document what problem is getting solved.) boot_params will keep the same and until it is passed x86_64_start_kernel in vmlinux. and there it will be copied, same as cmdline. but current kaslr only support random the base high, and it does not support kexec and current boot loader (grub2) put it really low. (under 1M). the real real_mode. :). Oh right! Of course, thanks for the details. Yes, it was implicitly ignored before, but now it needs to be explicitly ignored. Great, thanks! -Kees Thanks Yinghai -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/15] x86, kaslr: Use init_size instead of run_size
On Fri, Mar 6, 2015 at 10:44 AM, Yinghai Lu ying...@kernel.org wrote: On Fri, Mar 6, 2015 at 5:55 AM, Borislav Petkov b...@alien8.de wrote: On Wed, Mar 04, 2015 at 12:00:34AM -0800, Yinghai Lu wrote: commit e6023367d779 (x86, kaslr: Prevent .bss from overlaping initrd) introduced one run_size for kaslr. We do not need to have home grown run_size. We should use real runtime size (include copy/decompress) aka init_size Why? New change log: Subject: [PATCH] x86, kaslr: Use init_size instead of run_size commit e6023367d779 (x86, kaslr: Prevent .bss from overlaping initrd) introduced one run_size for kaslr. We should use real runtime size (include copy/decompress) aka init_size. run_size is size of VO (vmlinux). init_size is the size needed for decompress and it is bigger than run_size when decompress need more buff. According to arch/x86/boot/header.S: | #define ZO_INIT_SIZE(ZO__end - ZO_startup_32 + ZO_z_extract_offset) | #define VO_INIT_SIZE(VO__end - VO__text) | #if ZO_INIT_SIZE VO_INIT_SIZE | #define INIT_SIZE ZO_INIT_SIZE | #else | #define INIT_SIZE VO_INIT_SIZE | #endif | init_size: .long INIT_SIZE # kernel initialization size Bootloader allocate buffer according to init_size in hdr, and load the ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer. During running of ZO, ZO move itself to the middle of buffer at z_extract_offset to make sure that decompressor would not have output overwrite input data before input data get consumed. But z_extract_offset calculating is based on size of VO (vmlinux) and size of compressed VO only at first. So need to make [z_extra_offset, init_size) will fit ZO, that means init_size need to be adjusted according to ZO size. That make init_size is always = run_size. During aslr buffer searching, we need to make sure the buffer is bigger enough for decompress at first. So use init_size instead, and kill not needed run_size related code. I don't see how bss and brk are related to these sizes. Can you explain how bss, brk, and initrd factor into these sizes? Those were what run_size was created to represent. I don't want to accidentally start stomping on bss and brk again. :) -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Trusted kernel patchset for Secure Boot lockdown
On Fri, Mar 14, 2014 at 3:31 PM, One Thousand Gnomes gno...@lxorguk.ukuu.org.uk wrote: On Fri, 14 Mar 2014 22:15:45 + Matthew Garrett matthew.garr...@nebula.com wrote: On Fri, 2014-03-14 at 22:08 +, One Thousand Gnomes wrote: On Fri, 14 Mar 2014 21:56:33 + Matthew Garrett matthew.garr...@nebula.com wrote: Signed userspace is not a requirement, and therefore any solution that relies on a signed initrd is inadequate. There are use cases that require verification of the initrd and other levels. This isn't one of them. The job of the kernel is to solve the general problem. There are lots of people who happen to care about verification beyond the kernel so it shouldn't be ignored. And they can do do things like load trusted SELinux rulesets even if you can't support it in your environment. The general problem includes having to support this even without an selinux policy. Yes. No dispute about that. But equally the general solution should allow for it. And one that's not going to change, so the general problem includes not relying on a signed initramfs. Likewise some other way. ChromeOS will load unmeasured kernel modules provided it can attest to the trustworthyness of the filesystem containing them. See How to Bypass Verified Boot Security in Chromium OS 8) That method a) is intentionally available (system owner can disable firmware RO and install their own keys), and b) requires physical presence. So ChromeOS loads *measured* kernel modules. It just did the measuring differently to the signed module code. Right, using dm-verity. However, the read-only media case is still valid (and is why I am still trying to get the module restrictions LSM accepted, and why I'm modelling my firmware restrictions on the same principle). -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Trusted kernel patchset for Secure Boot lockdown
On Fri, Mar 14, 2014 at 4:18 PM, Theodore Ts'o ty...@mit.edu wrote: On Fri, Mar 14, 2014 at 10:08:40PM +, One Thousand Gnomes wrote: Signed userspace is not a requirement, and therefore any solution that relies on a signed initrd is inadequate. There are use cases that require verification of the initrd and other levels. This isn't one of them. The job of the kernel is to solve the general problem. There are lots of people who happen to care about verification beyond the kernel so it shouldn't be ignored. And they can do do things like load trusted SELinux rulesets even if you can't support it in your environment. This is really a question about goals and trust models. Alan is arguing that we should support trust models and goals which go far beyond the goals and trust model of the UEFI Forum. Matthew is, I think, trying to make the argument that his patches fulfill the goals that are needed so we can boot Linux distribution kernels on UEFI secure boot machines without worrying about Microsoft deciding to revoke keys so that Red Hat or SuSE will no longer be able to be bootable on desktops that are certified for Windows 8. And while we might want to improve the framework to support other trust models later on, supporting distro kernels on Windows 8 certified PC's is important enough that we should let these patches into mainline. Is that a fair summary of the two viewpoints? UEFI is a red herring in both cases. This isn't about UEFI, it just happens that one of the things that can assert trusted_kernel is the UEFI Secure Boot path. Chrome OS would assert trusted_kernel by passing it on the kernel command line (since it, along with firmware, kernel, and root fs are effectively measured). A boot-my-router-from-CD system can assert similarly because the kernel is on RO media. Based on my view of the conversation, it's about whether or not capable() can be made to do these checks. The proposal about creating the action/privilege map is interesting, and I'm all for doing it just to make things easy to reason about for capabilities, but it still seems separate from this work. There still needs to be a global boolean for the trusted kernel state. It would be used, for example, on the module param whitelisting, which has nothing to do with capabilities. It would be used to change driver behavior (e.g. disabling DMA or other badness that isn't trusted), which has nothing to do with capabilities. The ability to check this state is going to be needed going into the future as we uncover more dangerous interfaces. Since the capability work is separate, when it happens, that same regex work could replace some of the is_trusted_kernel checks with new action tests, but we still need the same infrastructure and identification of dangerous interfaces. Capabilities can be seen as related to this patch set, but it cannot be seen as a blocker. This logic is needed today, it's implemented, and it clearly marks where the known problems are. If an overhaul of capabilities happens, it can happen separately. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Trusted kernel patchset for Secure Boot lockdown
that covers this new thing we want to do and keeps the new policy distinctly separate. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Trusted kernel patchset for Secure Boot lockdown
On Wed, Mar 12, 2014 at 10:01 PM, Matthew Garrett matthew.garr...@nebula.com wrote: On Fri, 2014-02-28 at 14:03 +1100, James Morris wrote: Ok, which tree should take this? I'm happy to, although most of it is outside security/ . Should I be looking for someone else to take them instead? :) Andrew, is this series[1] something you'd be okay taking? It touches many different areas, so you might be best for it. -Kees [1] https://lkml.org/lkml/2014/2/26/811 -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/12] Add option to automatically set trusted_kernel when in Secure Boot mode
On Wed, Feb 26, 2014 at 2:48 PM, Matthew Garrett matthew.garr...@nebula.com wrote: On Wed, 2014-02-26 at 22:41 +, One Thousand Gnomes wrote: Another issue that needs addressing is firmware. Quite a few of our request_firmware cases load device firmware which is not signed into DMA capable hardware. Probably also worth checking what the architectural guarantees on bogus microcode updates is. Maybe we need firmware signing for such cases to match the mod signing ? Vendors keep telling me that they're validating firmware for new hardware, and I keep tending not to believe them. Meh. The big problem with firmware signatures is that we don't necessarily have the right to distribute modified versions of the firmware, so we'd need detached signature support. I'm certainly not against this. I have been working on a patch series for this. It will have LSM hooks for validating firmware origin (via fd) and contents (via blob), similar to the changes I made for validating module origins. It just need to finish testing, and I'll post the series. If you want to check it out in its current state, it's here: http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=fw-restrict -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Trusted kernel patchset for Secure Boot lockdown
On Wed, Feb 26, 2014 at 12:11 PM, Matthew Garrett matthew.garr...@nebula.com wrote: The conclusion we came to at Plumbers was that this patchset was basically fine but that Linus hated the name securelevel more than I hate pickled herring, so after thinking about this for a few months I've come up with Trusted Kernel. This flag indicates that the kernel is, via some external mechanism, trusted and should behave that way. If firmware has some way to verify the kernel, it can pass that information on. If userspace has some way to verify the kernel, it can set the flag itself. However, userspace should not attempt to use the flag as a means to verify that the kernel was trusted - untrusted userspace could have set it on an untrusted kernel, but by the same metric an untrusted kernel could just set it itself. If people object to this name then I swear to god that I will open a poll on Phoronix to decide the next attempt and you will like that even less. For the Chrome OS use-case, it might be better described as untrusted userspace, but that seems unfriendly. :) The trusted kernel name seems fine to me. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/12] One more attempt at useful kernel lockdown
On Tue, Sep 10, 2013 at 11:51 AM, gre...@linuxfoundation.org gre...@linuxfoundation.org wrote: On Tue, Sep 10, 2013 at 11:29:45AM -0700, H. Peter Anvin wrote: On 09/10/2013 11:26 AM, Matthew Garrett wrote: On Tue, 2013-09-10 at 14:23 -0300, Henrique de Moraes Holschuh wrote: On Tue, 10 Sep 2013, Matthew Garrett wrote: That's why modern systems require signed firmware updates. Linux doesn't. Is someone working on adding signature support to the runtime firmware loader? It'd be simple to do so, but so far the model appears to be that devices that expect signed firmware enforce that themselves. Most devices do absolutely no verification on the firmware, and simply trust the driver. So signing firmware is probably critical. How are you going to validate that the firmware is correct, given that it's just a blob living in the linux-firmware tree. If you sign it, what is that saying? In theory these blobs are traceable to a manufacturer. It's not really an indication that it's safe more than it's an indication that it hasn't been changed. But I haven't chased this very hard yet because of below... I'm with Matthew here, any device that needs/wants this, has their own built-in checking, nothing the kernel should do here. Especially given that no other os does this :) Yeah, it's impossible to handle since the way components do firmware updates is frequently exposed to userspace anyway. 3G modems that do firmware updates over the AT-command set, harddrives doing firmware updates over SCSI-generic commands, etc. Creating this barrier in the kernel is not a good solution; the component makers need to be doing the enforcement. :( -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/12] One more attempt at useful kernel lockdown
On Mon, Sep 9, 2013 at 4:30 PM, James Bottomley james.bottom...@hansenpartnership.com wrote: On Mon, 2013-09-09 at 16:20 -0700, Kees Cook wrote: On Mon, Sep 9, 2013 at 4:19 PM, David Lang da...@lang.hm wrote: And if SELinux can do the job, what is the reason for creating this new option? Not everyone uses SELinux. :) Also, it's rarely controlled the things we want to control here. It comes on by default (or its equivalent: AppArmour) in almost every shipping distro. Right, if LSM was meant here, yeah, I do use an LSM. But they, as a class of security policy in the kernel, handle isolation of entirely different things. The goal of no way to mess with ring-0 isn't really related to the goals of the LSM in general, or specific MACs in particular. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 08/11] kexec: Disable at runtime if the kernel enforces module loading restrictions
On Sun, Sep 8, 2013 at 12:24 AM, Greg KH gre...@linuxfoundation.org wrote: On Sun, Sep 08, 2013 at 06:44:08AM +, Matthew Garrett wrote: On Sat, 2013-09-07 at 23:40 -0700, Greg KH wrote: If you apply this, you break everyone who is currently relying on kexec (i.e. kdump, bootloaders, etc.), from using signed kernel modules, which personally, seems like a very bad idea. Enforcing signed modules provides you with no additional security if you have kexec enabled. It's better to make that obvious. Then document the heck out of it, don't disable a valid use case just because it possibly could be used in some way that is different from the original system. If you take this to an extreme, kexec shouldn't be here at all, as it can do anything in the kernel wherever it wants to. kexec has nothing to do with signed modules, don't tie them together. It's not accurate to say it has nothing to do with signed modules. The purpose of signed modules is to ensure the integrity of the running system against the root user. It was, however, incomplete. Terrible analogy follows: signed modules was locking the front door, but we have all sorts of windows still open. This closes those windows. You're trying to say that shutting windows has nothing to do with lumber locks. While technically true, this is about the intent of the barriers. Anyone currently using signed modules (with sig_enforce) AND kexec is deluding themselves about what the state of their system's ring-0 security stance is. Those people should be running without sig_enforce, and if they want both sig_enforce and kexec, then I would expect a follow-up patch from them to provide signed kexec support. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re:
On Tue, Sep 3, 2013 at 4:50 PM, Matthew Garrett matthew.garr...@nebula.com wrote: We have two in-kernel mechanisms for restricting module loading - disabling it entirely, or limiting it to the loading of modules signed with a trusted key. These can both be configured in such a way that even root is unable to relax the restrictions. However, right now, there's several other straightforward ways for root to modify running kernel code. At the most basic level these allow root to reset the configuration such that modules can be loaded again, rendering the existing restrictions useless. This patchset adds additional restrictions to various kernel entry points that would otherwise make it straightforward for root to disable enforcement of module loading restrictions. It also provides a patch that allows the kernel to be configured such that module signing will be automatically enabled when the system is booting via UEFI Secure Boot, allowing a stronger guarantee of kernel integrity. V3 addresses some review feedback and also locks down uswsusp. Looks good to me. Consider the entire series: Acked-by: Kees Cook keesc...@chromium.org -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10] Add additional security checks when module loading is restricted
On Wed, Aug 28, 2013 at 3:58 PM, Lenny Szubowicz lszub...@redhat.com wrote: - Original Message - From: Matthew Garrett matthew.garr...@nebula.com To: Lenny Szubowicz lszub...@redhat.com Cc: linux-ker...@vger.kernel.org, linux-efi@vger.kernel.org, jwbo...@redhat.com, keesc...@chromium.org Sent: Wednesday, August 28, 2013 6:41:55 PM Subject: Re: [PATCH 0/10] Add additional security checks when module loading is restricted On Wed, 2013-08-28 at 18:37 -0400, Lenny Szubowicz wrote: Did you purposely exclude similar checks for hibernate that were covered by earlier versions of your patch set? Yes, I think it's worth tying it in with the encrypted hibernation support. The local attack is significantly harder in the hibernation case - in the face of unknown hardware it basically involves a pre-generated memory image corresponding to your system or the ability to force a reboot into an untrusted environment. I think it's probably more workable to just add a configuration option for forcing encrypted hibernation when secure boot is in use. -- Matthew Garrett matthew.garr...@nebula.com I'm root. So I can write anything I want to the swap file that looks like a valid hibernate image but is code of my choosing. I can read anything I need from /dev/mem or /dev/kmem to help me do that. I can then immediately initiate a reboot. Strictly speaking, RAM contents are not available via /dev/*mem, even to root. However, you can request a suspend image be written, but to not enter hibernation. Then modify the image, and request a resume from it. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/10] asus-wmi: Restrict debugfs interface when module loading is restricted
On Mon, Aug 19, 2013 at 9:10 AM, Matthew Garrett matthew.garr...@nebula.com wrote: We have no way of validating what all of the Asus WMI methods do on a given machine, and there's a risk that some will allow hardware state to be manipulated in such a way that arbitrary code can be executed in the kernel, circumventing module loading restrictions. Prevent that if any of these features are enabled. Signed-off-by: Matthew Garrett matthew.garr...@nebula.com --- drivers/platform/x86/asus-wmi.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index 19c313b..8105530 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -1618,6 +1618,9 @@ static int show_dsts(struct seq_file *m, void *data) int err; u32 retval = -1; + if (secure_modules()) + return -EPERM; + err = asus_wmi_get_devstate(asus, asus-debug.dev_id, retval); if (err 0) @@ -1634,6 +1637,9 @@ static int show_devs(struct seq_file *m, void *data) int err; u32 retval = -1; + if (!capable(CAP_COMPROMISE_KERNEL)) Looks like this and the next chunk weren't changed to the secure_modules() API... -Kees + return -EPERM; + err = asus_wmi_set_devstate(asus-debug.dev_id, asus-debug.ctrl_param, retval); @@ -1658,6 +1664,9 @@ static int show_call(struct seq_file *m, void *data) union acpi_object *obj; acpi_status status; + if (!capable(CAP_COMPROMISE_KERNEL)) + return -EPERM; + status = wmi_evaluate_method(ASUS_WMI_MGMT_GUID, 1, asus-debug.method_id, input, output); -- 1.8.3.1 -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10] Add additional security checks when module loading is restricted
On Mon, Aug 19, 2013 at 10:26 AM, Matthew Garrett matthew.garr...@nebula.com wrote: We have two in-kernel mechanisms for restricting module loading - disabling it entirely, or limiting it to the loading of modules signed with a trusted key. These can both be configured in such a way that even root is unable to relax the restrictions. However, right now, there's several other straightforward ways for root to modify running kernel code. At the most basic level these allow root to reset the configuration such that modules can be loaded again, rendering the existing restrictions useless. This patchset adds additional restrictions to various kernel entry points that would otherwise make it straightforward for root to disable enforcement of module loading restrictions. It also provides a patch that allows the kernel to be configured such that module signing will be automatically enabled when the system is booting via UEFI Secure Boot, allowing a stronger guarantee of kernel integrity. I like this approach of attaching it to the module loading logic. Given the LSM hook for loading modules, I think I'd like to add a hook for the secure_modules() check so that an LSM can respond secure as well, if it doing module loading mediation (for example, in the case of Chrome OS's modules-only-from-rootfs). I'll work up a patch... -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Lock down MSR writing in secure boot
On Fri, Feb 8, 2013 at 11:42 AM, H. Peter Anvin h...@zytor.com wrote: On 02/08/2013 11:18 AM, Kees Cook wrote: No. CAP_RAWIO is for reading. Writing needs a much stronger check. If so, I suspect we need to do this for *all* raw I/O... but I keep wondering how much more sensitive writing really is than reading. Well, I think there's a reasonable distinction between systems that expect to strictly enforce user-space/kernel-space separation (CAP_COMPROMISE_KERNEL) and things that are fiddling with hardware (CAP_SYS_RAWIO). For example, even things like /dev/mem already have this separation (although it is stronger). You can't open /dev/mem without CAP_SYS_RAWIO, but if you do, you still can't write to RAM in /dev/mem. This might be one of the earliest examples of this distinction, actually. I think it's likely that after a while, we can convert some of these proposed CAP_COMPROMISE_KERNEL checks in always-deny once we figure out how to deal with those areas more safely. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Lock down MSR writing in secure boot
On Fri, Feb 8, 2013 at 5:29 PM, Matthew Garrett matthew.garr...@nebula.com wrote: On Fri, 2013-02-08 at 17:22 -0800, H. Peter Anvin wrote: You don't have to build the kernel twice to exclude a loadable module. I guess you could just strip the signatures off any modules you don't want to support under Secure Boot, but that breaks some other use cases. Also, _reading_ MSRs from userspace arguably has utility that doesn't compromise ring-0. So excluding the driver entirely seems like overkill. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html