Re: [PATCH] ipmi: Don't allow device module unload when in use
Corey, Testing shows that this patch works as expected. Regards, Tony On 10/14/19 11:46 AM, miny...@acm.org wrote: From: Corey Minyard If something has the IPMI driver open, don't allow the device module to be unloaded. Before it would unload and the user would get errors on use. This change is made on user request, and it makes it consistent with the I2C driver, which has the same behavior. It does change things a little bit with respect to kernel users. If the ACPI or IPMI watchdog (or any other kernel user) has created a user, then the device module cannot be unloaded. Before it could be unloaded, This does not affect hot-plug. If the device goes away (it's on something removable that is removed or is hot-removed via sysfs) then it still behaves as it did before. Reported-by: tony camuso Signed-off-by: Corey Minyard --- Tony, here is a suggested change for this. Can you look it over and see if it looks ok? Thanks, -corey drivers/char/ipmi/ipmi_msghandler.c | 23 --- include/linux/ipmi_smi.h| 12 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 2aab80e19ae0..15680de18625 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -448,6 +448,8 @@ enum ipmi_stat_indexes { #define IPMI_IPMB_NUM_SEQ 64 struct ipmi_smi { + struct module *owner; + /* What interface number are we? */ int intf_num; @@ -1220,6 +1222,11 @@ int ipmi_create_user(unsigned int if_num, if (rv) goto out_kfree; + if (!try_module_get(intf->owner)) { + rv = -ENODEV; + goto out_kfree; + } + /* Note that each existing user holds a refcount to the interface. */ kref_get(>refcount); @@ -1349,6 +1356,7 @@ static void _ipmi_destroy_user(struct ipmi_user *user) } kref_put(>refcount, intf_free); + module_put(intf->owner); } int ipmi_destroy_user(struct ipmi_user *user) @@ -2459,7 +2467,7 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) * been recently fetched, this will just use the cached data. Otherwise * it will run a new fetch. * - * Except for the first time this is called (in ipmi_register_smi()), + * Except for the first time this is called (in ipmi_add_smi()), * this will always return good data; */ static int __bmc_get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc, @@ -3377,10 +3385,11 @@ static void redo_bmc_reg(struct work_struct *work) kref_put(>refcount, intf_free); } -int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, - void *send_info, - struct device*si_dev, - unsigned charslave_addr) +int ipmi_add_smi(struct module *owner, +const struct ipmi_smi_handlers *handlers, +void *send_info, +struct device *si_dev, +unsigned char slave_addr) { int i, j; int rv; @@ -3406,7 +3415,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } - + intf->owner = owner; intf->bmc = >tmp_bmc; INIT_LIST_HEAD(>bmc->intfs); mutex_init(>bmc->dyn_mutex); @@ -3514,7 +3523,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } -EXPORT_SYMBOL(ipmi_register_smi); +EXPORT_SYMBOL(ipmi_add_smi); static void deliver_smi_err_response(struct ipmi_smi *intf, struct ipmi_smi_msg *msg, diff --git a/include/linux/ipmi_smi.h b/include/linux/ipmi_smi.h index 4dc66157d872..deec18b8944a 100644 --- a/include/linux/ipmi_smi.h +++ b/include/linux/ipmi_smi.h @@ -224,10 +224,14 @@ static inline int ipmi_demangle_device_id(uint8_t netfn, uint8_t cmd, * is called, and the lower layer must get the interface from that * call. */ -int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, - void *send_info, - struct device*dev, - unsigned charslave_addr); +int ipmi_add_smi(struct module*owner, +const struct ipmi_smi_handlers *handlers, +void *send_info, +struct device*dev, +unsigned charslave_addr); + +#define ipmi_register_smi(handlers, send_info, dev, slave_addr) \ + ipmi_add_smi(THIS_MODULE, handlers, send_info, dev, slave_addr) /* * Remove a low-level interface from the IPMI driver. This will
Re: [PATCH] ipmi: Don't allow device module unload when in use
On 10/16/19 3:33 PM, Corey Minyard wrote: On Wed, Oct 16, 2019 at 03:25:56PM -0400, Tony Camuso wrote: On 10/14/19 11:46 AM, miny...@acm.org wrote: From: Corey Minyard If something has the IPMI driver open, don't allow the device module to be unloaded. Before it would unload and the user would get errors on use. This change is made on user request, and it makes it consistent with the I2C driver, which has the same behavior. It does change things a little bit with respect to kernel users. If the ACPI or IPMI watchdog (or any other kernel user) has created a user, then the device module cannot be unloaded. Before it could be unloaded, This does not affect hot-plug. If the device goes away (it's on something removable that is removed or is hot-removed via sysfs) then it still behaves as it did before. Reported-by: tony camuso Signed-off-by: Corey Minyard --- Tony, here is a suggested change for this. Can you look it over and see if it looks ok? Thanks, -corey drivers/char/ipmi/ipmi_msghandler.c | 23 --- include/linux/ipmi_smi.h| 12 2 files changed, 24 insertions(+), 11 deletions(-) Hi Corey. You changed ipmi_register_ipmi to ipmi_add_ipmi in ipmi_msghandler, but you did not change it where it is actually called. # grep ipmi_register_smi drivers/char/ipmi/*.c drivers/char/ipmi/ipmi_powernv.c: rc = ipmi_register_smi(_powernv_smi_handlers, ipmi, dev, 0); drivers/char/ipmi/ipmi_si_intf.c: rv = ipmi_register_smi(, drivers/char/ipmi/ipmi_ssif.c: rv = ipmi_register_smi(_info->handlers, Is there a reason for changing the interface name? Is this something that I could do instead of troubling you with it? I don't understand. You don't say that anything went wrong, you just referenced something I changed. I changed the name so I could create a macro with that name to pass in the module name. Pretty standard to do in the kernel. Can't believe I missed that. Is there a compile or runtime issue? -corey All is well, so far. Haven't finished testing. Regards, Tony diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 2aab80e19ae0..15680de18625 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -448,6 +448,8 @@ enum ipmi_stat_indexes { #define IPMI_IPMB_NUM_SEQ64 struct ipmi_smi { + struct module *owner; + /* What interface number are we? */ int intf_num; @@ -1220,6 +1222,11 @@ int ipmi_create_user(unsigned int if_num, if (rv) goto out_kfree; + if (!try_module_get(intf->owner)) { + rv = -ENODEV; + goto out_kfree; + } + /* Note that each existing user holds a refcount to the interface. */ kref_get(>refcount); @@ -1349,6 +1356,7 @@ static void _ipmi_destroy_user(struct ipmi_user *user) } kref_put(>refcount, intf_free); + module_put(intf->owner); } int ipmi_destroy_user(struct ipmi_user *user) @@ -2459,7 +2467,7 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) * been recently fetched, this will just use the cached data. Otherwise * it will run a new fetch. * - * Except for the first time this is called (in ipmi_register_smi()), + * Except for the first time this is called (in ipmi_add_smi()), * this will always return good data; */ static int __bmc_get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc, @@ -3377,10 +3385,11 @@ static void redo_bmc_reg(struct work_struct *work) kref_put(>refcount, intf_free); } -int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, - void *send_info, - struct device*si_dev, - unsigned charslave_addr) +int ipmi_add_smi(struct module *owner, +const struct ipmi_smi_handlers *handlers, +void *send_info, +struct device *si_dev, +unsigned char slave_addr) { int i, j; int rv; @@ -3406,7 +3415,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } - + intf->owner = owner; intf->bmc = >tmp_bmc; INIT_LIST_HEAD(>bmc->intfs); mutex_init(>bmc->dyn_mutex); @@ -3514,7 +3523,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } -EXPORT_SYMBOL(ipmi_register_smi); +EXPORT_SYMBOL(ipmi_add_smi); static void deliver_smi_err_response(struct ipmi_smi *intf, struct ipmi_smi_msg *msg, diff --git a/include/linux/ipmi_smi.h b/include/linux/ipmi_smi.h index 4dc66157d872..deec18b8944a 100644 --- a/include/linux/ipmi_smi.h +++ b/include/linux/ipmi_smi.h
Re: [PATCH] ipmi: Don't allow device module unload when in use
On 10/14/19 11:46 AM, miny...@acm.org wrote: From: Corey Minyard If something has the IPMI driver open, don't allow the device module to be unloaded. Before it would unload and the user would get errors on use. This change is made on user request, and it makes it consistent with the I2C driver, which has the same behavior. It does change things a little bit with respect to kernel users. If the ACPI or IPMI watchdog (or any other kernel user) has created a user, then the device module cannot be unloaded. Before it could be unloaded, This does not affect hot-plug. If the device goes away (it's on something removable that is removed or is hot-removed via sysfs) then it still behaves as it did before. Reported-by: tony camuso Signed-off-by: Corey Minyard --- Tony, here is a suggested change for this. Can you look it over and see if it looks ok? Thanks, -corey drivers/char/ipmi/ipmi_msghandler.c | 23 --- include/linux/ipmi_smi.h| 12 2 files changed, 24 insertions(+), 11 deletions(-) Hi Corey. You changed ipmi_register_ipmi to ipmi_add_ipmi in ipmi_msghandler, but you did not change it where it is actually called. # grep ipmi_register_smi drivers/char/ipmi/*.c drivers/char/ipmi/ipmi_powernv.c: rc = ipmi_register_smi(_powernv_smi_handlers, ipmi, dev, 0); drivers/char/ipmi/ipmi_si_intf.c: rv = ipmi_register_smi(, drivers/char/ipmi/ipmi_ssif.c: rv = ipmi_register_smi(_info->handlers, Is there a reason for changing the interface name? Is this something that I could do instead of troubling you with it? Regards, Tony diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 2aab80e19ae0..15680de18625 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -448,6 +448,8 @@ enum ipmi_stat_indexes { #define IPMI_IPMB_NUM_SEQ 64 struct ipmi_smi { + struct module *owner; + /* What interface number are we? */ int intf_num; @@ -1220,6 +1222,11 @@ int ipmi_create_user(unsigned int if_num, if (rv) goto out_kfree; + if (!try_module_get(intf->owner)) { + rv = -ENODEV; + goto out_kfree; + } + /* Note that each existing user holds a refcount to the interface. */ kref_get(>refcount); @@ -1349,6 +1356,7 @@ static void _ipmi_destroy_user(struct ipmi_user *user) } kref_put(>refcount, intf_free); + module_put(intf->owner); } int ipmi_destroy_user(struct ipmi_user *user) @@ -2459,7 +2467,7 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) * been recently fetched, this will just use the cached data. Otherwise * it will run a new fetch. * - * Except for the first time this is called (in ipmi_register_smi()), + * Except for the first time this is called (in ipmi_add_smi()), * this will always return good data; */ static int __bmc_get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc, @@ -3377,10 +3385,11 @@ static void redo_bmc_reg(struct work_struct *work) kref_put(>refcount, intf_free); } -int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, - void *send_info, - struct device*si_dev, - unsigned charslave_addr) +int ipmi_add_smi(struct module *owner, +const struct ipmi_smi_handlers *handlers, +void *send_info, +struct device *si_dev, +unsigned char slave_addr) { int i, j; int rv; @@ -3406,7 +3415,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } - + intf->owner = owner; intf->bmc = >tmp_bmc; INIT_LIST_HEAD(>bmc->intfs); mutex_init(>bmc->dyn_mutex); @@ -3514,7 +3523,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } -EXPORT_SYMBOL(ipmi_register_smi); +EXPORT_SYMBOL(ipmi_add_smi); static void deliver_smi_err_response(struct ipmi_smi *intf, struct ipmi_smi_msg *msg, diff --git a/include/linux/ipmi_smi.h b/include/linux/ipmi_smi.h index 4dc66157d872..deec18b8944a 100644 --- a/include/linux/ipmi_smi.h +++ b/include/linux/ipmi_smi.h @@ -224,10 +224,14 @@ static inline int ipmi_demangle_device_id(uint8_t netfn, uint8_t cmd, * is called, and the lower layer must get the interface from that * call. */ -int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, - void *send_info, - struct device*dev, - unsigned charslave_addr); +int ipmi_add_smi(struct module
Re: [PATCH] ipmi: Don't allow device module unload when in use
On 10/16/19 8:18 AM, tony camuso wrote: On 10/14/19 11:46 AM, miny...@acm.org wrote: From: Corey Minyard If something has the IPMI driver open, don't allow the device module to be unloaded. Before it would unload and the user would get errors on use. This change is made on user request, and it makes it consistent with the I2C driver, which has the same behavior. It does change things a little bit with respect to kernel users. If the ACPI or IPMI watchdog (or any other kernel user) has created a user, then the device module cannot be unloaded. Before it could be unloaded, This does not affect hot-plug. If the device goes away (it's on something removable that is removed or is hot-removed via sysfs) then it still behaves as it did before. Reported-by: tony camuso Signed-off-by: Corey Minyard --- Tony, here is a suggested change for this. Can you look it over and see if it looks ok? Thanks, -corey drivers/char/ipmi/ipmi_msghandler.c | 23 --- include/linux/ipmi_smi.h | 12 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 2aab80e19ae0..15680de18625 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -448,6 +448,8 @@ enum ipmi_stat_indexes { #define IPMI_IPMB_NUM_SEQ 64 struct ipmi_smi { + struct module *owner; + /* What interface number are we? */ int intf_num; @@ -1220,6 +1222,11 @@ int ipmi_create_user(unsigned int if_num, if (rv) goto out_kfree; + if (!try_module_get(intf->owner)) { + rv = -ENODEV; + goto out_kfree; + } + /* Note that each existing user holds a refcount to the interface. */ kref_get(>refcount); @@ -1349,6 +1356,7 @@ static void _ipmi_destroy_user(struct ipmi_user *user) } kref_put(>refcount, intf_free); + module_put(intf->owner); } int ipmi_destroy_user(struct ipmi_user *user) @@ -2459,7 +2467,7 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) * been recently fetched, this will just use the cached data. Otherwise * it will run a new fetch. * - * Except for the first time this is called (in ipmi_register_smi()), + * Except for the first time this is called (in ipmi_add_smi()), * this will always return good data; */ static int __bmc_get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc, @@ -3377,10 +3385,11 @@ static void redo_bmc_reg(struct work_struct *work) kref_put(>refcount, intf_free); } -int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, - void *send_info, - struct device *si_dev, - unsigned char slave_addr) +int ipmi_add_smi(struct module *owner, + const struct ipmi_smi_handlers *handlers, + void *send_info, + struct device *si_dev, + unsigned char slave_addr) { int i, j; int rv; @@ -3406,7 +3415,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } - + intf->owner = owner; intf->bmc = >tmp_bmc; INIT_LIST_HEAD(>bmc->intfs); mutex_init(>bmc->dyn_mutex); @@ -3514,7 +3523,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } -EXPORT_SYMBOL(ipmi_register_smi); +EXPORT_SYMBOL(ipmi_add_smi); static void deliver_smi_err_response(struct ipmi_smi *intf, struct ipmi_smi_msg *msg, diff --git a/include/linux/ipmi_smi.h b/include/linux/ipmi_smi.h index 4dc66157d872..deec18b8944a 100644 --- a/include/linux/ipmi_smi.h +++ b/include/linux/ipmi_smi.h @@ -224,10 +224,14 @@ static inline int ipmi_demangle_device_id(uint8_t netfn, uint8_t cmd, * is called, and the lower layer must get the interface from that * call. */ -int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, - void *send_info, - struct device *dev, - unsigned char slave_addr); +int ipmi_add_smi(struct module *owner, + const struct ipmi_smi_handlers *handlers, + void *send_info, + struct device *dev, + unsigned char slave_addr); + +#define ipmi_register_smi(handlers, send_info, dev, slave_addr) \ + ipmi_add_smi(THIS_MODULE, handlers, send_info, dev, slave_addr) /* * Remove a low-level interface from the IPMI driver. This will Thanks, Corey. Regards, Tony And I meant to add that I will be testing this over the next couple days.
Re: [PATCH] ipmi: Don't allow device module unload when in use
On 10/14/19 11:46 AM, miny...@acm.org wrote: From: Corey Minyard If something has the IPMI driver open, don't allow the device module to be unloaded. Before it would unload and the user would get errors on use. This change is made on user request, and it makes it consistent with the I2C driver, which has the same behavior. It does change things a little bit with respect to kernel users. If the ACPI or IPMI watchdog (or any other kernel user) has created a user, then the device module cannot be unloaded. Before it could be unloaded, This does not affect hot-plug. If the device goes away (it's on something removable that is removed or is hot-removed via sysfs) then it still behaves as it did before. Reported-by: tony camuso Signed-off-by: Corey Minyard --- Tony, here is a suggested change for this. Can you look it over and see if it looks ok? Thanks, -corey drivers/char/ipmi/ipmi_msghandler.c | 23 --- include/linux/ipmi_smi.h| 12 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 2aab80e19ae0..15680de18625 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -448,6 +448,8 @@ enum ipmi_stat_indexes { #define IPMI_IPMB_NUM_SEQ 64 struct ipmi_smi { + struct module *owner; + /* What interface number are we? */ int intf_num; @@ -1220,6 +1222,11 @@ int ipmi_create_user(unsigned int if_num, if (rv) goto out_kfree; + if (!try_module_get(intf->owner)) { + rv = -ENODEV; + goto out_kfree; + } + /* Note that each existing user holds a refcount to the interface. */ kref_get(>refcount); @@ -1349,6 +1356,7 @@ static void _ipmi_destroy_user(struct ipmi_user *user) } kref_put(>refcount, intf_free); + module_put(intf->owner); } int ipmi_destroy_user(struct ipmi_user *user) @@ -2459,7 +2467,7 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) * been recently fetched, this will just use the cached data. Otherwise * it will run a new fetch. * - * Except for the first time this is called (in ipmi_register_smi()), + * Except for the first time this is called (in ipmi_add_smi()), * this will always return good data; */ static int __bmc_get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc, @@ -3377,10 +3385,11 @@ static void redo_bmc_reg(struct work_struct *work) kref_put(>refcount, intf_free); } -int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, - void *send_info, - struct device*si_dev, - unsigned charslave_addr) +int ipmi_add_smi(struct module *owner, +const struct ipmi_smi_handlers *handlers, +void *send_info, +struct device *si_dev, +unsigned char slave_addr) { int i, j; int rv; @@ -3406,7 +3415,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } - + intf->owner = owner; intf->bmc = >tmp_bmc; INIT_LIST_HEAD(>bmc->intfs); mutex_init(>bmc->dyn_mutex); @@ -3514,7 +3523,7 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, return rv; } -EXPORT_SYMBOL(ipmi_register_smi); +EXPORT_SYMBOL(ipmi_add_smi); static void deliver_smi_err_response(struct ipmi_smi *intf, struct ipmi_smi_msg *msg, diff --git a/include/linux/ipmi_smi.h b/include/linux/ipmi_smi.h index 4dc66157d872..deec18b8944a 100644 --- a/include/linux/ipmi_smi.h +++ b/include/linux/ipmi_smi.h @@ -224,10 +224,14 @@ static inline int ipmi_demangle_device_id(uint8_t netfn, uint8_t cmd, * is called, and the lower layer must get the interface from that * call. */ -int ipmi_register_smi(const struct ipmi_smi_handlers *handlers, - void *send_info, - struct device*dev, - unsigned charslave_addr); +int ipmi_add_smi(struct module*owner, +const struct ipmi_smi_handlers *handlers, +void *send_info, +struct device*dev, +unsigned charslave_addr); + +#define ipmi_register_smi(handlers, send_info, dev, slave_addr) \ + ipmi_add_smi(THIS_MODULE, handlers, send_info, dev, slave_addr) /* * Remove a low-level interface from the IPMI driver. This will Thanks, Corey. Regards, Tony
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/19/2017 01:53 PM, Corey Minyard wrote: Ok, I have this queued for the next kernel, and in linux-next for testing. I added some information about when this was introduced. I also plan on sending this to stable, if that's ok with you. Yes, I've tested it for regressions, and it's fine. A note below On 06/19/2017 12:17 PM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters the following crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- When I added this patch all the text below here disappeared, since --- is the marker for "end of text" :). I removed the --- here, but just FYI for the future. Oh, wow, thanks! I have scripts that look for "---", so I should know better. Thanks again, -corey [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_msghandler.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/ch
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/19/2017 01:53 PM, Corey Minyard wrote: Ok, I have this queued for the next kernel, and in linux-next for testing. I added some information about when this was introduced. I also plan on sending this to stable, if that's ok with you. Yes, I've tested it for regressions, and it's fine. A note below On 06/19/2017 12:17 PM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters the following crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- When I added this patch all the text below here disappeared, since --- is the marker for "end of text" :). I removed the --- here, but just FYI for the future. Oh, wow, thanks! I have scripts that look for "---", so I should know better. Thanks again, -corey [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_msghandler.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c i
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/19/2017 01:17 PM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters the following crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_msghandler.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 9f69995..49a7e96 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -3878,6 +3878,9 @@ static void smi_recv_tasklet(unsigned long val) * because the lower layer is allowed to hold locks while calling * message delivery. */ + + rcu_read_lock(); + if (!run_to_completion) spin_lock_irqsave(>xmit_msgs_lock, flags); if (intf->curr_msg == NULL && !intf->in_shutdown) { @@ -3900,6 +3903,8 @@ static void smi_recv_tasklet(unsigned long val) if (newmsg
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/19/2017 01:17 PM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters the following crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_msghandler.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 9f69995..49a7e96 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -3878,6 +3878,9 @@ static void smi_recv_tasklet(unsigned long val) * because the lower layer is allowed to hold locks while calling * message delivery. */ + + rcu_read_lock(); + if (!run_to_completion) spin_lock_irqsave(>xmit_msgs_lock, flags); if (intf->curr_msg == NULL && !intf->in_shutdown) { @@ -3900,6 +3903,8 @@ static void smi_recv_tasklet(unsigned long val) if (newmsg) intf-&g
[PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
A vendor with a system having more than 128 CPUs occasionally encounters the following crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_msghandler.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 9f69995..49a7e96 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -3878,6 +3878,9 @@ static void smi_recv_tasklet(unsigned long val) * because the lower layer is allowed to hold locks while calling * message delivery. */ + + rcu_read_lock(); + if (!run_to_completion) spin_lock_irqsave(>xmit_msgs_lock, flags); if (intf->curr_msg == NULL && !intf->in_shutdown) { @@ -3900,6 +3903,8 @@ static void smi_recv_tasklet(unsigned long val) if (newmsg) intf->handlers->sender(intf->send_info, newmsg); +
[PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
A vendor with a system having more than 128 CPUs occasionally encounters the following crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_msghandler.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 9f69995..49a7e96 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -3878,6 +3878,9 @@ static void smi_recv_tasklet(unsigned long val) * because the lower layer is allowed to hold locks while calling * message delivery. */ + + rcu_read_lock(); + if (!run_to_completion) spin_lock_irqsave(>xmit_msgs_lock, flags); if (intf->curr_msg == NULL && !intf->in_shutdown) { @@ -3900,6 +3903,8 @@ static void smi_recv_tasklet(unsigned long val) if (newmsg) intf->handlers->sender(intf->send_info, newmsg); + rcu_re
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/19/2017 10:32 AM, Corey Minyard wrote: On 06/19/2017 09:29 AM, Tony Camuso wrote: On 06/19/2017 09:31 AM, Corey Minyard wrote: On 06/16/2017 07:15 AM, Corey Minyard wrote: On 06/15/2017 10:54 AM, Corey Minyard wrote: On 06/13/2017 09:54 AM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters a crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020 @.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. I think you are right about this problem, but in_shutdown is checked already a bit before when newmsg is extracted from the list. Wouldn't it be better to add the rcu_read_lock() region starting right before the previous in_shutdown check to after the send? That would avoid a leak in this case. While lying awake unable to sleep, I realized that you can't call the sender function while holding rcu_read_lock(). That will break RT, because you can't claim a mutex while holding rcu_read_lock(), and the sender function will claim normal spinlocks. So I need to think about this a bit. I was wrong about this. An rcu_read_lock()
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/19/2017 10:32 AM, Corey Minyard wrote: On 06/19/2017 09:29 AM, Tony Camuso wrote: On 06/19/2017 09:31 AM, Corey Minyard wrote: On 06/16/2017 07:15 AM, Corey Minyard wrote: On 06/15/2017 10:54 AM, Corey Minyard wrote: On 06/13/2017 09:54 AM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters a crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020 @.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. I think you are right about this problem, but in_shutdown is checked already a bit before when newmsg is extracted from the list. Wouldn't it be better to add the rcu_read_lock() region starting right before the previous in_shutdown check to after the send? That would avoid a leak in this case. While lying awake unable to sleep, I realized that you can't call the sender function while holding rcu_read_lock(). That will break RT, because you can't claim a mutex while holding rcu_read_lock(), and the sender function will claim normal spinlocks. So I need to think about this a bit. I was wrong about this. An rcu_read_lock()
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/19/2017 09:31 AM, Corey Minyard wrote: On 06/16/2017 07:15 AM, Corey Minyard wrote: On 06/15/2017 10:54 AM, Corey Minyard wrote: On 06/13/2017 09:54 AM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters a crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020 @.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. I think you are right about this problem, but in_shutdown is checked already a bit before when newmsg is extracted from the list. Wouldn't it be better to add the rcu_read_lock() region starting right before the previous in_shutdown check to after the send? That would avoid a leak in this case. While lying awake unable to sleep, I realized that you can't call the sender function while holding rcu_read_lock(). That will break RT, because you can't claim a mutex while holding rcu_read_lock(), and the sender function will claim normal spinlocks. So I need to think about this a bit. I was wrong about this. An rcu_read_lock() around the whole thing should be all that is required to fix this. I can do a patch, o
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/19/2017 09:31 AM, Corey Minyard wrote: On 06/16/2017 07:15 AM, Corey Minyard wrote: On 06/15/2017 10:54 AM, Corey Minyard wrote: On 06/13/2017 09:54 AM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters a crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020 @.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. I think you are right about this problem, but in_shutdown is checked already a bit before when newmsg is extracted from the list. Wouldn't it be better to add the rcu_read_lock() region starting right before the previous in_shutdown check to after the send? That would avoid a leak in this case. While lying awake unable to sleep, I realized that you can't call the sender function while holding rcu_read_lock(). That will break RT, because you can't claim a mutex while holding rcu_read_lock(), and the sender function will claim normal spinlocks. So I need to think about this a bit. I was wrong about this. An rcu_read_lock() around the whole thing should be all that is required to fix this. I can do a patch, o
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/16/2017 08:15 AM, Corey Minyard wrote: On 06/15/2017 10:54 AM, Corey Minyard wrote: On 06/13/2017 09:54 AM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters a crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. I think you are right about this problem, but in_shutdown is checked already a bit before when newmsg is extracted from the list. Wouldn't it be better to add the rcu_read_lock() region starting right before the previous in_shutdown check to after the send? That would avoid a leak in this case. While lying awake unable to sleep, I realized that you can't call the sender function while holding rcu_read_lock(). That will break RT, because you can't claim a mutex while holding rcu_read_lock(), and the sender function will claim normal spinlocks. So I need to think about this a bit. -corey Thanks, -corey Would this be adequate to prevent the race? Is the sender's mutex/spinlock sufficient to limit access? diff --git a/drivers/char/ipmi/ipmi_msghan
Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
On 06/16/2017 08:15 AM, Corey Minyard wrote: On 06/15/2017 10:54 AM, Corey Minyard wrote: On 06/13/2017 09:54 AM, Tony Camuso wrote: A vendor with a system having more than 128 CPUs occasionally encounters a crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. I think you are right about this problem, but in_shutdown is checked already a bit before when newmsg is extracted from the list. Wouldn't it be better to add the rcu_read_lock() region starting right before the previous in_shutdown check to after the send? That would avoid a leak in this case. While lying awake unable to sleep, I realized that you can't call the sender function while holding rcu_read_lock(). That will break RT, because you can't claim a mutex while holding rcu_read_lock(), and the sender function will claim normal spinlocks. So I need to think about this a bit. -corey Thanks, -corey Would this be adequate to prevent the race? Is the sender's mutex/spinlock sufficient to limit access? diff --git a/drivers/char/ipmi/ipmi_msghan
[PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
A vendor with a system having more than 128 CPUs occasionally encounters a crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_msghandler.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 9f69995..577509f 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -3897,8 +3897,13 @@ static void smi_recv_tasklet(unsigned long val) } if (!run_to_completion) spin_unlock_irqrestore(>xmit_msgs_lock, flags); - if (newmsg) - intf->handlers->sender(intf->send_info, newmsg); + + if (newmsg) { + rcu_read_lock(); + if (!intf->in_shutdown) + intf->handlers->sender(intf->send_info, newmsg); + rcu_read_unlock(); + } handle_new_recv_msgs(intf); } -- 1.8.3.1
[PATCH] ipmi: use rcu lock around call to intf->handlers->sender()
A vendor with a system having more than 128 CPUs occasionally encounters a crash during shutdown. This is not an easily reproduceable event, but the vendor was able to provide the following analysis of the crash, which exhibits the same footprint each time. crash> bt PID: 0 TASK: 88017c70ce70 CPU: 5 COMMAND: "swapper/5" #0 [88085c143ac8] machine_kexec at 81059c8b #1 [88085c143b28] __crash_kexec at 811052e2 #2 [88085c143bf8] crash_kexec at 811053d0 #3 [88085c143c10] oops_end at 8168ef88 #4 [88085c143c38] no_context at 8167ebb3 #5 [88085c143c88] __bad_area_nosemaphore at 8167ec49 #6 [88085c143cd0] bad_area_nosemaphore at 8167edb3 #7 [88085c143ce0] __do_page_fault at 81691d1e #8 [88085c143d40] do_page_fault at 81691ec5 #9 [88085c143d70] page_fault at 8168e188 [exception RIP: unknown or invalid address] RIP: a053c800 RSP: 88085c143e28 RFLAGS: 00010206 RAX: 88017c72bfd8 RBX: 88017a8dc000 RCX: 8810588b5ac8 RDX: 8810588b5a00 RSI: a053c800 RDI: 8810588b5a00 RBP: 88085c143e58 R8: 88017c70d408 R9: 88017a8dc000 R10: 0002 R11: 88085c143da0 R12: 8810588b5ac8 R13: 0100 R14: a053c800 R15: 8810588b5a00 ORIG_RAX: CS: 0010 SS: 0018 --- --- [exception RIP: cpuidle_enter_state+82] RIP: 81514192 RSP: 88017c72be50 RFLAGS: 0202 RAX: 001e4c3c6f16 RBX: f8a0 RCX: 0018 RDX: 000225c17d03 RSI: 88017c72bfd8 RDI: 001e4c3c6f16 RBP: 88017c72be78 R8: 237e R9: 0018 R10: 2494 R11: 0001 R12: 88017c72be20 R13: 88085c14f8e0 R14: 0082 R15: 001e4c3bb400 ORIG_RAX: ff10 CS: 0010 SS: 0018 This is the corresponding stack trace It has crashed because the area pointed with RIP extracted from timer element is already removed during a shutdown process. The function is smi_timeout(). And we think 8810588b5a00 in RDX is a parameter struct smi_info crash> rd 8810588b5a00 20 8810588b5a00: 8810588b6000 .`.X 8810588b5a10: 880853264400 a05417e0 .D 8810588b5a20: 24a024a0 .$.$ 8810588b5a30: 8810588b5a40: a053a040 a053a060 @.S.`.S. 8810588b5a50: 00010001 8810588b5a60: 0e00 8810588b5a70: a053a580 a053a6e0 ..S...S. 8810588b5a80: a053a4a0 a053a250 ..S.P.S. 8810588b5a90: 00050002 Unfortunately the top of this area is already detroyed by someone. But because of two reasonns we think this is struct smi_info 1) The address included in between 8810588b5a70 and 8810588b5a80: are inside of ipmi_si_intf.c see crash> module 88085779d2c0 2) We've found the area which point this. It is offset 0x68 of 880859df4000 crash> rd 880859df4000 100 880859df4000: 0001 880859df4010: a0535290 dead0200 .RS. 880859df4020: 880859df4020 880859df4020@.Y @.Y 880859df4030: 0002 00100010 880859df4040: 880859df4040 880859df4040 @@.Y@@.Y 880859df4050: 880859df4060: 8810588b5a00 .Z.X 880859df4070: 0001 880859df4078 x@.Y If we regards it as struct ipmi_smi in shutdown process it looks consistent. The remedy for this apparent race is affixed below. Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_msghandler.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 9f69995..577509f 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -3897,8 +3897,13 @@ static void smi_recv_tasklet(unsigned long val) } if (!run_to_completion) spin_unlock_irqrestore(>xmit_msgs_lock, flags); - if (newmsg) - intf->handlers->sender(intf->send_info, newmsg); + + if (newmsg) { + rcu_read_lock(); + if (!intf->in_shutdown) + intf->handlers->sender(intf->send_info, newmsg); + rcu_read_unlock(); + } handle_new_recv_msgs(intf); } -- 1.8.3.1
Re: [PATCH] ipmi_si: use smi_num for init_name
On 04/10/2017 01:43 PM, miny...@acm.org wrote: From: Tony Camuso <tcam...@redhat.com> Commit 1abf71e moved the creation of new_smi->dev to earlier in the init sequence in order to provide infrastructure for log printing. However, the init_name was created with a hard-coded value of zero. This presents a problem in systems with more than one interface, producing a call trace in dmesg. To correct the problem, simply use smi_num instead of the hard-coded value of zero. Tested on a lenovo x3950. Signed-off-by: Tony Camuso <tcam...@redhat.com> There was actually a more general problem, the platform device wasn't being set correctly, either, and there was a possible (though extremely unlikely) race on smi_num. Add locks to clean up the race and use the proper value for the platform device, too. Tested on qemu in various configurations. Signed-off-by: Corey Minyard <cminy...@mvista.com> --- I reworked this a bit, I think I've covered all the problems with smi_num. Tony, does this work for you? Thanks Works perfectly. Regards, Tony drivers/char/ipmi/ipmi_si_intf.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 2a7c425..b2b618f 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1954,7 +1954,9 @@ static int hotmod_handler(const char *val, struct kernel_param *kp) kfree(info); goto out; } + mutex_lock(_infos_lock); rv = try_smi_init(info); + mutex_unlock(_infos_lock); if (rv) { cleanup_one_si(info); goto out; @@ -2042,8 +2044,10 @@ static int hardcode_find_bmc(void) info->slave_addr = slave_addrs[i]; if (!add_smi(info)) { + mutex_lock(_infos_lock); if (try_smi_init(info)) cleanup_one_si(info); + mutex_unlock(_infos_lock); ret = 0; } else { kfree(info); @@ -3492,6 +3496,11 @@ static int add_smi(struct smi_info *new_smi) return rv; } +/* + * Try to start up an interface. Must be called with smi_infos_lock + * held, primarily to keep smi_num consistent, we only one to do these + * one at a time. + */ static int try_smi_init(struct smi_info *new_smi) { int rv = 0; @@ -3524,9 +3533,12 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err; } + new_smi->intf_num = smi_num; + /* Do this early so it's available for logs. */ if (!new_smi->dev) { - init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", 0); + init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", + new_smi->intf_num); /* * If we don't already have a device from something @@ -3593,8 +3605,6 @@ static int try_smi_init(struct smi_info *new_smi) new_smi->interrupt_disabled = true; atomic_set(_smi->need_watch, 0); - new_smi->intf_num = smi_num; - smi_num++; rv = try_enable_event_buffer(new_smi); if (rv == 0) @@ -3661,6 +3671,9 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err_stop_timer; } + /* Don't increment till we know we have succeeded. */ + smi_num++; + dev_info(new_smi->dev, "IPMI %s interface initialized\n", si_to_str[new_smi->si_type]);
Re: [PATCH] ipmi_si: use smi_num for init_name
On 04/10/2017 01:43 PM, miny...@acm.org wrote: From: Tony Camuso Commit 1abf71e moved the creation of new_smi->dev to earlier in the init sequence in order to provide infrastructure for log printing. However, the init_name was created with a hard-coded value of zero. This presents a problem in systems with more than one interface, producing a call trace in dmesg. To correct the problem, simply use smi_num instead of the hard-coded value of zero. Tested on a lenovo x3950. Signed-off-by: Tony Camuso There was actually a more general problem, the platform device wasn't being set correctly, either, and there was a possible (though extremely unlikely) race on smi_num. Add locks to clean up the race and use the proper value for the platform device, too. Tested on qemu in various configurations. Signed-off-by: Corey Minyard --- I reworked this a bit, I think I've covered all the problems with smi_num. Tony, does this work for you? Thanks Works perfectly. Regards, Tony drivers/char/ipmi/ipmi_si_intf.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 2a7c425..b2b618f 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1954,7 +1954,9 @@ static int hotmod_handler(const char *val, struct kernel_param *kp) kfree(info); goto out; } + mutex_lock(_infos_lock); rv = try_smi_init(info); + mutex_unlock(_infos_lock); if (rv) { cleanup_one_si(info); goto out; @@ -2042,8 +2044,10 @@ static int hardcode_find_bmc(void) info->slave_addr = slave_addrs[i]; if (!add_smi(info)) { + mutex_lock(_infos_lock); if (try_smi_init(info)) cleanup_one_si(info); + mutex_unlock(_infos_lock); ret = 0; } else { kfree(info); @@ -3492,6 +3496,11 @@ static int add_smi(struct smi_info *new_smi) return rv; } +/* + * Try to start up an interface. Must be called with smi_infos_lock + * held, primarily to keep smi_num consistent, we only one to do these + * one at a time. + */ static int try_smi_init(struct smi_info *new_smi) { int rv = 0; @@ -3524,9 +3533,12 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err; } + new_smi->intf_num = smi_num; + /* Do this early so it's available for logs. */ if (!new_smi->dev) { - init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", 0); + init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", + new_smi->intf_num); /* * If we don't already have a device from something @@ -3593,8 +3605,6 @@ static int try_smi_init(struct smi_info *new_smi) new_smi->interrupt_disabled = true; atomic_set(_smi->need_watch, 0); - new_smi->intf_num = smi_num; - smi_num++; rv = try_enable_event_buffer(new_smi); if (rv == 0) @@ -3661,6 +3671,9 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err_stop_timer; } + /* Don't increment till we know we have succeeded. */ + smi_num++; + dev_info(new_smi->dev, "IPMI %s interface initialized\n", si_to_str[new_smi->si_type]);
Re: [PATCH] ipmi_si: use smi_num for init_name
On 04/10/2017 01:43 PM, miny...@acm.org wrote: From: Tony Camuso <tcam...@redhat.com> Commit 1abf71e moved the creation of new_smi->dev to earlier in the init sequence in order to provide infrastructure for log printing. However, the init_name was created with a hard-coded value of zero. This presents a problem in systems with more than one interface, producing a call trace in dmesg. To correct the problem, simply use smi_num instead of the hard-coded value of zero. Tested on a lenovo x3950. Signed-off-by: Tony Camuso <tcam...@redhat.com> There was actually a more general problem, the platform device wasn't being set correctly, either, and there was a possible (though extremely unlikely) race on smi_num. Add locks to clean up the race and use the proper value for the platform device, too. Tested on qemu in various configurations. Signed-off-by: Corey Minyard <cminy...@mvista.com> --- I reworked this a bit, I think I've covered all the problems with smi_num. Tony, does this work for you? Testing now ... Thanks drivers/char/ipmi/ipmi_si_intf.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 2a7c425..b2b618f 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1954,7 +1954,9 @@ static int hotmod_handler(const char *val, struct kernel_param *kp) kfree(info); goto out; } + mutex_lock(_infos_lock); rv = try_smi_init(info); + mutex_unlock(_infos_lock); if (rv) { cleanup_one_si(info); goto out; @@ -2042,8 +2044,10 @@ static int hardcode_find_bmc(void) info->slave_addr = slave_addrs[i]; if (!add_smi(info)) { + mutex_lock(_infos_lock); if (try_smi_init(info)) cleanup_one_si(info); + mutex_unlock(_infos_lock); ret = 0; } else { kfree(info); @@ -3492,6 +3496,11 @@ static int add_smi(struct smi_info *new_smi) return rv; } +/* + * Try to start up an interface. Must be called with smi_infos_lock + * held, primarily to keep smi_num consistent, we only one to do these + * one at a time. + */ static int try_smi_init(struct smi_info *new_smi) { int rv = 0; @@ -3524,9 +3533,12 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err; } + new_smi->intf_num = smi_num; + /* Do this early so it's available for logs. */ if (!new_smi->dev) { - init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", 0); + init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", + new_smi->intf_num); /* * If we don't already have a device from something @@ -3593,8 +3605,6 @@ static int try_smi_init(struct smi_info *new_smi) new_smi->interrupt_disabled = true; atomic_set(_smi->need_watch, 0); - new_smi->intf_num = smi_num; - smi_num++; rv = try_enable_event_buffer(new_smi); if (rv == 0) @@ -3661,6 +3671,9 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err_stop_timer; } + /* Don't increment till we know we have succeeded. */ + smi_num++; + dev_info(new_smi->dev, "IPMI %s interface initialized\n", si_to_str[new_smi->si_type]);
Re: [PATCH] ipmi_si: use smi_num for init_name
On 04/10/2017 01:43 PM, miny...@acm.org wrote: From: Tony Camuso Commit 1abf71e moved the creation of new_smi->dev to earlier in the init sequence in order to provide infrastructure for log printing. However, the init_name was created with a hard-coded value of zero. This presents a problem in systems with more than one interface, producing a call trace in dmesg. To correct the problem, simply use smi_num instead of the hard-coded value of zero. Tested on a lenovo x3950. Signed-off-by: Tony Camuso There was actually a more general problem, the platform device wasn't being set correctly, either, and there was a possible (though extremely unlikely) race on smi_num. Add locks to clean up the race and use the proper value for the platform device, too. Tested on qemu in various configurations. Signed-off-by: Corey Minyard --- I reworked this a bit, I think I've covered all the problems with smi_num. Tony, does this work for you? Testing now ... Thanks drivers/char/ipmi/ipmi_si_intf.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 2a7c425..b2b618f 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1954,7 +1954,9 @@ static int hotmod_handler(const char *val, struct kernel_param *kp) kfree(info); goto out; } + mutex_lock(_infos_lock); rv = try_smi_init(info); + mutex_unlock(_infos_lock); if (rv) { cleanup_one_si(info); goto out; @@ -2042,8 +2044,10 @@ static int hardcode_find_bmc(void) info->slave_addr = slave_addrs[i]; if (!add_smi(info)) { + mutex_lock(_infos_lock); if (try_smi_init(info)) cleanup_one_si(info); + mutex_unlock(_infos_lock); ret = 0; } else { kfree(info); @@ -3492,6 +3496,11 @@ static int add_smi(struct smi_info *new_smi) return rv; } +/* + * Try to start up an interface. Must be called with smi_infos_lock + * held, primarily to keep smi_num consistent, we only one to do these + * one at a time. + */ static int try_smi_init(struct smi_info *new_smi) { int rv = 0; @@ -3524,9 +3533,12 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err; } + new_smi->intf_num = smi_num; + /* Do this early so it's available for logs. */ if (!new_smi->dev) { - init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", 0); + init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", + new_smi->intf_num); /* * If we don't already have a device from something @@ -3593,8 +3605,6 @@ static int try_smi_init(struct smi_info *new_smi) new_smi->interrupt_disabled = true; atomic_set(_smi->need_watch, 0); - new_smi->intf_num = smi_num; - smi_num++; rv = try_enable_event_buffer(new_smi); if (rv == 0) @@ -3661,6 +3671,9 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err_stop_timer; } + /* Don't increment till we know we have succeeded. */ + smi_num++; + dev_info(new_smi->dev, "IPMI %s interface initialized\n", si_to_str[new_smi->si_type]);
Re: [PATCH] ipmi_si: use smi_num for init_name
On 04/10/2017 01:43 PM, miny...@acm.org wrote: From: Tony Camuso <tcam...@redhat.com> Commit 1abf71e moved the creation of new_smi->dev to earlier in the init sequence in order to provide infrastructure for log printing. However, the init_name was created with a hard-coded value of zero. This presents a problem in systems with more than one interface, producing a call trace in dmesg. To correct the problem, simply use smi_num instead of the hard-coded value of zero. Tested on a lenovo x3950. Signed-off-by: Tony Camuso <tcam...@redhat.com> There was actually a more general problem, the platform device wasn't being set correctly, either, and there was a possible (though extremely unlikely) race on smi_num. Add locks to clean up the race and use the proper value for the platform device, too. Ouch! I thought smi_num was already locked by smi_infos. Now I see that it wasn't. Thanks! Regards, tony Tested on qemu in various configurations. Signed-off-by: Corey Minyard <cminy...@mvista.com> --- I reworked this a bit, I think I've covered all the problems with smi_num. Tony, does this work for you? Thanks drivers/char/ipmi/ipmi_si_intf.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 2a7c425..b2b618f 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1954,7 +1954,9 @@ static int hotmod_handler(const char *val, struct kernel_param *kp) kfree(info); goto out; } + mutex_lock(_infos_lock); rv = try_smi_init(info); + mutex_unlock(_infos_lock); if (rv) { cleanup_one_si(info); goto out; @@ -2042,8 +2044,10 @@ static int hardcode_find_bmc(void) info->slave_addr = slave_addrs[i]; if (!add_smi(info)) { + mutex_lock(_infos_lock); if (try_smi_init(info)) cleanup_one_si(info); + mutex_unlock(_infos_lock); ret = 0; } else { kfree(info); @@ -3492,6 +3496,11 @@ static int add_smi(struct smi_info *new_smi) return rv; } +/* + * Try to start up an interface. Must be called with smi_infos_lock + * held, primarily to keep smi_num consistent, we only one to do these + * one at a time. + */ static int try_smi_init(struct smi_info *new_smi) { int rv = 0; @@ -3524,9 +3533,12 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err; } + new_smi->intf_num = smi_num; + /* Do this early so it's available for logs. */ if (!new_smi->dev) { - init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", 0); + init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", + new_smi->intf_num); /* * If we don't already have a device from something @@ -3593,8 +3605,6 @@ static int try_smi_init(struct smi_info *new_smi) new_smi->interrupt_disabled = true; atomic_set(_smi->need_watch, 0); - new_smi->intf_num = smi_num; - smi_num++; rv = try_enable_event_buffer(new_smi); if (rv == 0) @@ -3661,6 +3671,9 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err_stop_timer; } + /* Don't increment till we know we have succeeded. */ + smi_num++; + dev_info(new_smi->dev, "IPMI %s interface initialized\n", si_to_str[new_smi->si_type]);
Re: [PATCH] ipmi_si: use smi_num for init_name
On 04/10/2017 01:43 PM, miny...@acm.org wrote: From: Tony Camuso Commit 1abf71e moved the creation of new_smi->dev to earlier in the init sequence in order to provide infrastructure for log printing. However, the init_name was created with a hard-coded value of zero. This presents a problem in systems with more than one interface, producing a call trace in dmesg. To correct the problem, simply use smi_num instead of the hard-coded value of zero. Tested on a lenovo x3950. Signed-off-by: Tony Camuso There was actually a more general problem, the platform device wasn't being set correctly, either, and there was a possible (though extremely unlikely) race on smi_num. Add locks to clean up the race and use the proper value for the platform device, too. Ouch! I thought smi_num was already locked by smi_infos. Now I see that it wasn't. Thanks! Regards, tony Tested on qemu in various configurations. Signed-off-by: Corey Minyard --- I reworked this a bit, I think I've covered all the problems with smi_num. Tony, does this work for you? Thanks drivers/char/ipmi/ipmi_si_intf.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 2a7c425..b2b618f 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1954,7 +1954,9 @@ static int hotmod_handler(const char *val, struct kernel_param *kp) kfree(info); goto out; } + mutex_lock(_infos_lock); rv = try_smi_init(info); + mutex_unlock(_infos_lock); if (rv) { cleanup_one_si(info); goto out; @@ -2042,8 +2044,10 @@ static int hardcode_find_bmc(void) info->slave_addr = slave_addrs[i]; if (!add_smi(info)) { + mutex_lock(_infos_lock); if (try_smi_init(info)) cleanup_one_si(info); + mutex_unlock(_infos_lock); ret = 0; } else { kfree(info); @@ -3492,6 +3496,11 @@ static int add_smi(struct smi_info *new_smi) return rv; } +/* + * Try to start up an interface. Must be called with smi_infos_lock + * held, primarily to keep smi_num consistent, we only one to do these + * one at a time. + */ static int try_smi_init(struct smi_info *new_smi) { int rv = 0; @@ -3524,9 +3533,12 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err; } + new_smi->intf_num = smi_num; + /* Do this early so it's available for logs. */ if (!new_smi->dev) { - init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", 0); + init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", + new_smi->intf_num); /* * If we don't already have a device from something @@ -3593,8 +3605,6 @@ static int try_smi_init(struct smi_info *new_smi) new_smi->interrupt_disabled = true; atomic_set(_smi->need_watch, 0); - new_smi->intf_num = smi_num; - smi_num++; rv = try_enable_event_buffer(new_smi); if (rv == 0) @@ -3661,6 +3671,9 @@ static int try_smi_init(struct smi_info *new_smi) goto out_err_stop_timer; } + /* Don't increment till we know we have succeeded. */ + smi_num++; + dev_info(new_smi->dev, "IPMI %s interface initialized\n", si_to_str[new_smi->si_type]);
[PATCH] ipmi_si: use smi_num for init_name
Commit 1abf71e moved the creation of new_smi->dev to earlier in the init sequence in order to provide infrastructure for log printing. However, the init_name was created with a hard-coded value of zero. This presents a problem in systems with more than one interface, producing a call trace in dmesg. To correct the problem, simply use smi_num instead of the hard-coded value of zero. Tested on a lenovo x3950. Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_si_intf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 2a7c425..9d8fc51 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -3526,7 +3526,7 @@ static int try_smi_init(struct smi_info *new_smi) /* Do this early so it's available for logs. */ if (!new_smi->dev) { - init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", 0); + init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", smi_num); /* * If we don't already have a device from something -- 1.8.3.1
[PATCH] ipmi_si: use smi_num for init_name
Commit 1abf71e moved the creation of new_smi->dev to earlier in the init sequence in order to provide infrastructure for log printing. However, the init_name was created with a hard-coded value of zero. This presents a problem in systems with more than one interface, producing a call trace in dmesg. To correct the problem, simply use smi_num instead of the hard-coded value of zero. Tested on a lenovo x3950. Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_si_intf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 2a7c425..9d8fc51 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -3526,7 +3526,7 @@ static int try_smi_init(struct smi_info *new_smi) /* Do this early so it's available for logs. */ if (!new_smi->dev) { - init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", 0); + init_name = kasprintf(GFP_KERNEL, "ipmi_si.%d", smi_num); /* * If we don't already have a device from something -- 1.8.3.1
Re: [PATCH 1/1] ipmi: remove trydefaults parameter and default init
On 06/28/2016 08:11 AM, Corey Minyard wrote: On 06/28/2016 05:14 AM, Tony Camuso wrote: On 06/27/2016 09:34 PM, Corey Minyard wrote: Queued for 2.18, if that is ok. Thanks, -corey Yes, as long as there's enough time to shake it out before it's merged with Linus. I've done some testing on a couple archs, and I'll do more on some others as the opportunity arises. It's in linux-next, hopefully that provides enough shaking out. -corey That should suffice.
Re: [PATCH 1/1] ipmi: remove trydefaults parameter and default init
On 06/28/2016 08:11 AM, Corey Minyard wrote: On 06/28/2016 05:14 AM, Tony Camuso wrote: On 06/27/2016 09:34 PM, Corey Minyard wrote: Queued for 2.18, if that is ok. Thanks, -corey Yes, as long as there's enough time to shake it out before it's merged with Linus. I've done some testing on a couple archs, and I'll do more on some others as the opportunity arises. It's in linux-next, hopefully that provides enough shaking out. -corey That should suffice.
Re: [PATCH 1/1] ipmi: remove trydefaults parameter and default init
On 06/27/2016 09:34 PM, Corey Minyard wrote: Queued for 2.18, if that is ok. Thanks, -corey Yes, as long as there's enough time to shake it out before it's merged with Linus. I've done some testing on a couple archs, and I'll do more on some others as the opportunity arises. Regards, Tony On 06/22/2016 01:22 PM, Tony Camuso wrote: Parameter trydefaults=1 causes the ipmi_init to initialize ipmi through the legacy port io space that was designated for ipmi. Architectures that do not map legacy port io can panic when trydefaults=1. Rather than implement build-time conditional exceptions for each architecture that does not map legacy port io, we have removed legacy port io from the driver. Parameter 'trydefaults' has been removed. Attempts to use it hereafter will evoke the "Unknown symbol in module, or unknown parameter" message. The patch was built against a number of architectures and tested for regressions and functionality on x86_64 and ARM64. Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_si_intf.c | 73 1 file changed, 73 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 7b1c412..a112c01 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1322,7 +1322,6 @@ static bool si_tryplatform = true; #ifdef CONFIG_PCI static bool si_trypci = true; #endif -static bool si_trydefaults = IS_ENABLED(CONFIG_IPMI_SI_PROBE_DEFAULTS); static char *si_type[SI_MAX_PARMS]; #define MAX_SI_TYPE_STR 30 static char si_type_str[MAX_SI_TYPE_STR]; @@ -1371,10 +1370,6 @@ module_param_named(trypci, si_trypci, bool, 0); MODULE_PARM_DESC(trypci, "Setting this to zero will disable the" " default scan of the interfaces identified via pci"); #endif -module_param_named(trydefaults, si_trydefaults, bool, 0); -MODULE_PARM_DESC(trydefaults, "Setting this to 'false' will disable the" - " default scan of the KCS and SMIC interface at the standard" - " address"); module_param_string(type, si_type_str, MAX_SI_TYPE_STR, 0); MODULE_PARM_DESC(type, "Defines the type of each interface, each" " interface separated by commas. The types are 'kcs'," @@ -3461,62 +3456,6 @@ static inline void wait_for_timer_and_thread(struct smi_info *smi_info) del_timer_sync(_info->si_timer); } -static const struct ipmi_default_vals -{ -const int type; -const int port; -} ipmi_defaults[] = -{ -{ .type = SI_KCS, .port = 0xca2 }, -{ .type = SI_SMIC, .port = 0xca9 }, -{ .type = SI_BT, .port = 0xe4 }, -{ .port = 0 } -}; - -static void default_find_bmc(void) -{ -struct smi_info *info; -int i; - -for (i = 0; ; i++) { -if (!ipmi_defaults[i].port) -break; -#ifdef CONFIG_PPC -if (check_legacy_ioport(ipmi_defaults[i].port)) -continue; -#endif -info = smi_info_alloc(); -if (!info) -return; - -info->addr_source = SI_DEFAULT; - -info->si_type = ipmi_defaults[i].type; -info->io_setup = port_setup; -info->io.addr_data = ipmi_defaults[i].port; -info->io.addr_type = IPMI_IO_ADDR_SPACE; - -info->io.addr = NULL; -info->io.regspacing = DEFAULT_REGSPACING; -info->io.regsize = DEFAULT_REGSPACING; -info->io.regshift = 0; - -if (add_smi(info) == 0) { -if ((try_smi_init(info)) == 0) { -/* Found one... */ -printk(KERN_INFO PFX "Found default %s" -" state machine at %s address 0x%lx\n", -si_to_str[info->si_type], -addr_space_to_str[info->io.addr_type], -info->io.addr_data); -} else -cleanup_one_si(info); -} else { -kfree(info); -} -} -} - static int is_new_interface(struct smi_info *info) { struct smi_info *e; @@ -3844,8 +3783,6 @@ static int init_ipmi_si(void) #ifdef CONFIG_PARISC register_parisc_driver(_parisc_driver); parisc_registered = true; -/* poking PC IO addresses will crash machine, don't do it */ -si_trydefaults = 0; #endif /* We prefer devices with interrupts, but in the case of a machine @@ -3885,16 +3822,6 @@ static int init_ipmi_si(void) if (type) return 0; -if (si_trydefaults) { -mutex_lock(_infos_lock); -if (list_empty(_infos)) { -/* No BMC was found, try defaults. */ -mutex_unlock(_infos_lock); -default_find_bmc(); -} else -mutex_unlock(_infos_lock); -} - mutex_lock(_infos_lock); if (unload_when_empty && list_empty(_infos)) { mutex_unlock(_infos_lock);
Re: [PATCH 1/1] ipmi: remove trydefaults parameter and default init
On 06/27/2016 09:34 PM, Corey Minyard wrote: Queued for 2.18, if that is ok. Thanks, -corey Yes, as long as there's enough time to shake it out before it's merged with Linus. I've done some testing on a couple archs, and I'll do more on some others as the opportunity arises. Regards, Tony On 06/22/2016 01:22 PM, Tony Camuso wrote: Parameter trydefaults=1 causes the ipmi_init to initialize ipmi through the legacy port io space that was designated for ipmi. Architectures that do not map legacy port io can panic when trydefaults=1. Rather than implement build-time conditional exceptions for each architecture that does not map legacy port io, we have removed legacy port io from the driver. Parameter 'trydefaults' has been removed. Attempts to use it hereafter will evoke the "Unknown symbol in module, or unknown parameter" message. The patch was built against a number of architectures and tested for regressions and functionality on x86_64 and ARM64. Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_si_intf.c | 73 1 file changed, 73 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 7b1c412..a112c01 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1322,7 +1322,6 @@ static bool si_tryplatform = true; #ifdef CONFIG_PCI static bool si_trypci = true; #endif -static bool si_trydefaults = IS_ENABLED(CONFIG_IPMI_SI_PROBE_DEFAULTS); static char *si_type[SI_MAX_PARMS]; #define MAX_SI_TYPE_STR 30 static char si_type_str[MAX_SI_TYPE_STR]; @@ -1371,10 +1370,6 @@ module_param_named(trypci, si_trypci, bool, 0); MODULE_PARM_DESC(trypci, "Setting this to zero will disable the" " default scan of the interfaces identified via pci"); #endif -module_param_named(trydefaults, si_trydefaults, bool, 0); -MODULE_PARM_DESC(trydefaults, "Setting this to 'false' will disable the" - " default scan of the KCS and SMIC interface at the standard" - " address"); module_param_string(type, si_type_str, MAX_SI_TYPE_STR, 0); MODULE_PARM_DESC(type, "Defines the type of each interface, each" " interface separated by commas. The types are 'kcs'," @@ -3461,62 +3456,6 @@ static inline void wait_for_timer_and_thread(struct smi_info *smi_info) del_timer_sync(_info->si_timer); } -static const struct ipmi_default_vals -{ -const int type; -const int port; -} ipmi_defaults[] = -{ -{ .type = SI_KCS, .port = 0xca2 }, -{ .type = SI_SMIC, .port = 0xca9 }, -{ .type = SI_BT, .port = 0xe4 }, -{ .port = 0 } -}; - -static void default_find_bmc(void) -{ -struct smi_info *info; -int i; - -for (i = 0; ; i++) { -if (!ipmi_defaults[i].port) -break; -#ifdef CONFIG_PPC -if (check_legacy_ioport(ipmi_defaults[i].port)) -continue; -#endif -info = smi_info_alloc(); -if (!info) -return; - -info->addr_source = SI_DEFAULT; - -info->si_type = ipmi_defaults[i].type; -info->io_setup = port_setup; -info->io.addr_data = ipmi_defaults[i].port; -info->io.addr_type = IPMI_IO_ADDR_SPACE; - -info->io.addr = NULL; -info->io.regspacing = DEFAULT_REGSPACING; -info->io.regsize = DEFAULT_REGSPACING; -info->io.regshift = 0; - -if (add_smi(info) == 0) { -if ((try_smi_init(info)) == 0) { -/* Found one... */ -printk(KERN_INFO PFX "Found default %s" -" state machine at %s address 0x%lx\n", -si_to_str[info->si_type], -addr_space_to_str[info->io.addr_type], -info->io.addr_data); -} else -cleanup_one_si(info); -} else { -kfree(info); -} -} -} - static int is_new_interface(struct smi_info *info) { struct smi_info *e; @@ -3844,8 +3783,6 @@ static int init_ipmi_si(void) #ifdef CONFIG_PARISC register_parisc_driver(_parisc_driver); parisc_registered = true; -/* poking PC IO addresses will crash machine, don't do it */ -si_trydefaults = 0; #endif /* We prefer devices with interrupts, but in the case of a machine @@ -3885,16 +3822,6 @@ static int init_ipmi_si(void) if (type) return 0; -if (si_trydefaults) { -mutex_lock(_infos_lock); -if (list_empty(_infos)) { -/* No BMC was found, try defaults. */ -mutex_unlock(_infos_lock); -default_find_bmc(); -} else -mutex_unlock(_infos_lock); -} - mutex_lock(_infos_lock); if (unload_when_empty && list_empty(_infos)) { mutex_unlock(_infos_lock);
[PATCH 1/1] ipmi: remove trydefaults parameter and default init
Parameter trydefaults=1 causes the ipmi_init to initialize ipmi through the legacy port io space that was designated for ipmi. Architectures that do not map legacy port io can panic when trydefaults=1. Rather than implement build-time conditional exceptions for each architecture that does not map legacy port io, we have removed legacy port io from the driver. Parameter 'trydefaults' has been removed. Attempts to use it hereafter will evoke the "Unknown symbol in module, or unknown parameter" message. The patch was built against a number of architectures and tested for regressions and functionality on x86_64 and ARM64. Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_si_intf.c | 73 1 file changed, 73 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 7b1c412..a112c01 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1322,7 +1322,6 @@ static bool si_tryplatform = true; #ifdef CONFIG_PCI static bool si_trypci = true; #endif -static bool si_trydefaults = IS_ENABLED(CONFIG_IPMI_SI_PROBE_DEFAULTS); static char *si_type[SI_MAX_PARMS]; #define MAX_SI_TYPE_STR 30 static char si_type_str[MAX_SI_TYPE_STR]; @@ -1371,10 +1370,6 @@ module_param_named(trypci, si_trypci, bool, 0); MODULE_PARM_DESC(trypci, "Setting this to zero will disable the" " default scan of the interfaces identified via pci"); #endif -module_param_named(trydefaults, si_trydefaults, bool, 0); -MODULE_PARM_DESC(trydefaults, "Setting this to 'false' will disable the" -" default scan of the KCS and SMIC interface at the standard" -" address"); module_param_string(type, si_type_str, MAX_SI_TYPE_STR, 0); MODULE_PARM_DESC(type, "Defines the type of each interface, each" " interface separated by commas. The types are 'kcs'," @@ -3461,62 +3456,6 @@ static inline void wait_for_timer_and_thread(struct smi_info *smi_info) del_timer_sync(_info->si_timer); } -static const struct ipmi_default_vals -{ - const int type; - const int port; -} ipmi_defaults[] = -{ - { .type = SI_KCS, .port = 0xca2 }, - { .type = SI_SMIC, .port = 0xca9 }, - { .type = SI_BT, .port = 0xe4 }, - { .port = 0 } -}; - -static void default_find_bmc(void) -{ - struct smi_info *info; - int i; - - for (i = 0; ; i++) { - if (!ipmi_defaults[i].port) - break; -#ifdef CONFIG_PPC - if (check_legacy_ioport(ipmi_defaults[i].port)) - continue; -#endif - info = smi_info_alloc(); - if (!info) - return; - - info->addr_source = SI_DEFAULT; - - info->si_type = ipmi_defaults[i].type; - info->io_setup = port_setup; - info->io.addr_data = ipmi_defaults[i].port; - info->io.addr_type = IPMI_IO_ADDR_SPACE; - - info->io.addr = NULL; - info->io.regspacing = DEFAULT_REGSPACING; - info->io.regsize = DEFAULT_REGSPACING; - info->io.regshift = 0; - - if (add_smi(info) == 0) { - if ((try_smi_init(info)) == 0) { - /* Found one... */ - printk(KERN_INFO PFX "Found default %s" - " state machine at %s address 0x%lx\n", - si_to_str[info->si_type], - addr_space_to_str[info->io.addr_type], - info->io.addr_data); - } else - cleanup_one_si(info); - } else { - kfree(info); - } - } -} - static int is_new_interface(struct smi_info *info) { struct smi_info *e; @@ -3844,8 +3783,6 @@ static int init_ipmi_si(void) #ifdef CONFIG_PARISC register_parisc_driver(_parisc_driver); parisc_registered = true; - /* poking PC IO addresses will crash machine, don't do it */ - si_trydefaults = 0; #endif /* We prefer devices with interrupts, but in the case of a machine @@ -3885,16 +3822,6 @@ static int init_ipmi_si(void) if (type) return 0; - if (si_trydefaults) { - mutex_lock(_infos_lock); - if (list_empty(_infos)) { - /* No BMC was found, try defaults. */ - mutex_unlock(_infos_lock); - default_find_bmc(); - } else - mutex_unlock(_infos_lock); - } - mutex_lock(_infos_lock); if (unload_when_empty && list_empty(_infos)) { mutex_unlock(_infos_lock); -- 2.5.5
[PATCH 1/1] ipmi: remove trydefaults parameter and default init
Parameter trydefaults=1 causes the ipmi_init to initialize ipmi through the legacy port io space that was designated for ipmi. Architectures that do not map legacy port io can panic when trydefaults=1. Rather than implement build-time conditional exceptions for each architecture that does not map legacy port io, we have removed legacy port io from the driver. Parameter 'trydefaults' has been removed. Attempts to use it hereafter will evoke the "Unknown symbol in module, or unknown parameter" message. The patch was built against a number of architectures and tested for regressions and functionality on x86_64 and ARM64. Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_si_intf.c | 73 1 file changed, 73 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 7b1c412..a112c01 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1322,7 +1322,6 @@ static bool si_tryplatform = true; #ifdef CONFIG_PCI static bool si_trypci = true; #endif -static bool si_trydefaults = IS_ENABLED(CONFIG_IPMI_SI_PROBE_DEFAULTS); static char *si_type[SI_MAX_PARMS]; #define MAX_SI_TYPE_STR 30 static char si_type_str[MAX_SI_TYPE_STR]; @@ -1371,10 +1370,6 @@ module_param_named(trypci, si_trypci, bool, 0); MODULE_PARM_DESC(trypci, "Setting this to zero will disable the" " default scan of the interfaces identified via pci"); #endif -module_param_named(trydefaults, si_trydefaults, bool, 0); -MODULE_PARM_DESC(trydefaults, "Setting this to 'false' will disable the" -" default scan of the KCS and SMIC interface at the standard" -" address"); module_param_string(type, si_type_str, MAX_SI_TYPE_STR, 0); MODULE_PARM_DESC(type, "Defines the type of each interface, each" " interface separated by commas. The types are 'kcs'," @@ -3461,62 +3456,6 @@ static inline void wait_for_timer_and_thread(struct smi_info *smi_info) del_timer_sync(_info->si_timer); } -static const struct ipmi_default_vals -{ - const int type; - const int port; -} ipmi_defaults[] = -{ - { .type = SI_KCS, .port = 0xca2 }, - { .type = SI_SMIC, .port = 0xca9 }, - { .type = SI_BT, .port = 0xe4 }, - { .port = 0 } -}; - -static void default_find_bmc(void) -{ - struct smi_info *info; - int i; - - for (i = 0; ; i++) { - if (!ipmi_defaults[i].port) - break; -#ifdef CONFIG_PPC - if (check_legacy_ioport(ipmi_defaults[i].port)) - continue; -#endif - info = smi_info_alloc(); - if (!info) - return; - - info->addr_source = SI_DEFAULT; - - info->si_type = ipmi_defaults[i].type; - info->io_setup = port_setup; - info->io.addr_data = ipmi_defaults[i].port; - info->io.addr_type = IPMI_IO_ADDR_SPACE; - - info->io.addr = NULL; - info->io.regspacing = DEFAULT_REGSPACING; - info->io.regsize = DEFAULT_REGSPACING; - info->io.regshift = 0; - - if (add_smi(info) == 0) { - if ((try_smi_init(info)) == 0) { - /* Found one... */ - printk(KERN_INFO PFX "Found default %s" - " state machine at %s address 0x%lx\n", - si_to_str[info->si_type], - addr_space_to_str[info->io.addr_type], - info->io.addr_data); - } else - cleanup_one_si(info); - } else { - kfree(info); - } - } -} - static int is_new_interface(struct smi_info *info) { struct smi_info *e; @@ -3844,8 +3783,6 @@ static int init_ipmi_si(void) #ifdef CONFIG_PARISC register_parisc_driver(_parisc_driver); parisc_registered = true; - /* poking PC IO addresses will crash machine, don't do it */ - si_trydefaults = 0; #endif /* We prefer devices with interrupts, but in the case of a machine @@ -3885,16 +3822,6 @@ static int init_ipmi_si(void) if (type) return 0; - if (si_trydefaults) { - mutex_lock(_infos_lock); - if (list_empty(_infos)) { - /* No BMC was found, try defaults. */ - mutex_unlock(_infos_lock); - default_find_bmc(); - } else - mutex_unlock(_infos_lock); - } - mutex_lock(_infos_lock); if (unload_when_empty && list_empty(_infos)) { mutex_unlock(_infos_lock); -- 2.5.5
Re: [PATCH] ipmi: set si_trydefaults=0 for ARM64
On 06/20/2016 04:57 PM, Corey Minyard wrote: On 06/20/2016 01:26 PM, Tony Camuso wrote: Port I/O space does not exist in ARM64 and is not mapped. Attempts to access it on ARM systems cause stack traces and worse. At this point, I think it is best to just completely pull out all concept of "default addresses" in the IPMI driver. The defaults were disabled by default in 3.16, this is as good an impetus as any to just get rid of them. If you want, you can do a patch, or I can pull them out if you would prefer that. I'll give it a go. Should have something later today. Regards, Tony Thanks, -corey Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_si_intf.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 615abbf..85dcc86 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -3841,6 +3841,11 @@ static int init_ipmi_si(void) spmi_find_bmc(); #endif +#ifdef CONFIG_ARM64 +/* Don't touch port io space */ +si_trydefaults = 0; +#endif + #ifdef CONFIG_PARISC register_parisc_driver(_parisc_driver); parisc_registered = true;
Re: [PATCH] ipmi: set si_trydefaults=0 for ARM64
On 06/20/2016 04:57 PM, Corey Minyard wrote: On 06/20/2016 01:26 PM, Tony Camuso wrote: Port I/O space does not exist in ARM64 and is not mapped. Attempts to access it on ARM systems cause stack traces and worse. At this point, I think it is best to just completely pull out all concept of "default addresses" in the IPMI driver. The defaults were disabled by default in 3.16, this is as good an impetus as any to just get rid of them. If you want, you can do a patch, or I can pull them out if you would prefer that. I'll give it a go. Should have something later today. Regards, Tony Thanks, -corey Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_si_intf.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 615abbf..85dcc86 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -3841,6 +3841,11 @@ static int init_ipmi_si(void) spmi_find_bmc(); #endif +#ifdef CONFIG_ARM64 +/* Don't touch port io space */ +si_trydefaults = 0; +#endif + #ifdef CONFIG_PARISC register_parisc_driver(_parisc_driver); parisc_registered = true;
[PATCH] ipmi: set si_trydefaults=0 for ARM64
Port I/O space does not exist in ARM64 and is not mapped. Attempts to access it on ARM systems cause stack traces and worse. Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_si_intf.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 615abbf..85dcc86 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -3841,6 +3841,11 @@ static int init_ipmi_si(void) spmi_find_bmc(); #endif +#ifdef CONFIG_ARM64 + /* Don't touch port io space */ + si_trydefaults = 0; +#endif + #ifdef CONFIG_PARISC register_parisc_driver(_parisc_driver); parisc_registered = true; -- 1.8.3.1
[PATCH] ipmi: set si_trydefaults=0 for ARM64
Port I/O space does not exist in ARM64 and is not mapped. Attempts to access it on ARM systems cause stack traces and worse. Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_si_intf.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 615abbf..85dcc86 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -3841,6 +3841,11 @@ static int init_ipmi_si(void) spmi_find_bmc(); #endif +#ifdef CONFIG_ARM64 + /* Don't touch port io space */ + si_trydefaults = 0; +#endif + #ifdef CONFIG_PARISC register_parisc_driver(_parisc_driver); parisc_registered = true; -- 1.8.3.1
[PATCH 1/1] ipmi: move timer init to before irq is setup
From: Jan Stancek We encountered a panic on boot in ipmi_si on a dell per320 due to an uninitialized timer as follows. static int smi_start_processing(void *send_info, ipmi_smi_t intf) { /* Try to claim any interrupts. */ if (new_smi->irq_setup) new_smi->irq_setup(new_smi); --> IRQ arrives here and irq handler tries to modify uninitialized timer which triggers BUG_ON(!timer->function) in __mod_timer(). Call Trace: [] start_new_msg+0x47/0x80 [ipmi_si] [] start_check_enables+0x4e/0x60 [ipmi_si] [] smi_event_handler+0x1e8/0x640 [ipmi_si] [] ? __rcu_process_callbacks+0x54/0x350 [] si_irq_handler+0x3c/0x60 [ipmi_si] [] handle_IRQ_event+0x60/0x170 [] handle_edge_irq+0xde/0x180 [] handle_irq+0x49/0xa0 [] do_IRQ+0x6c/0xf0 [] ret_from_intr+0x0/0x11 /* Set up the timer that drives the interface. */ setup_timer(_smi->si_timer, smi_timeout, (long)new_smi); The following patch fixes the problem. To: openipmi-develo...@lists.sourceforge.net To: Corey Minyard CC: linux-kernel@vger.kernel.org Signed-off-by: Jan Stancek Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_si_intf.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 55fe902..4cc72fa 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1230,14 +1230,14 @@ static int smi_start_processing(void *send_info, new_smi->intf = intf; - /* Try to claim any interrupts. */ - if (new_smi->irq_setup) - new_smi->irq_setup(new_smi); - /* Set up the timer that drives the interface. */ setup_timer(_smi->si_timer, smi_timeout, (long)new_smi); smi_mod_timer(new_smi, jiffies + SI_TIMEOUT_JIFFIES); + /* Try to claim any interrupts. */ + if (new_smi->irq_setup) + new_smi->irq_setup(new_smi); + /* * Check if the user forcefully enabled the daemon. */ -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] ipmi: move timer init to before irq is setup
From: Jan Stancek <jstan...@redhat.com> We encountered a panic on boot in ipmi_si on a dell per320 due to an uninitialized timer as follows. static int smi_start_processing(void *send_info, ipmi_smi_t intf) { /* Try to claim any interrupts. */ if (new_smi->irq_setup) new_smi->irq_setup(new_smi); --> IRQ arrives here and irq handler tries to modify uninitialized timer which triggers BUG_ON(!timer->function) in __mod_timer(). Call Trace: [] start_new_msg+0x47/0x80 [ipmi_si] [] start_check_enables+0x4e/0x60 [ipmi_si] [] smi_event_handler+0x1e8/0x640 [ipmi_si] [] ? __rcu_process_callbacks+0x54/0x350 [] si_irq_handler+0x3c/0x60 [ipmi_si] [] handle_IRQ_event+0x60/0x170 [] handle_edge_irq+0xde/0x180 [] handle_irq+0x49/0xa0 [] do_IRQ+0x6c/0xf0 [] ret_from_intr+0x0/0x11 /* Set up the timer that drives the interface. */ setup_timer(_smi->si_timer, smi_timeout, (long)new_smi); The following patch fixes the problem. To: openipmi-develo...@lists.sourceforge.net To: Corey Minyard <miny...@acm.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Jan Stancek <jstan...@redhat.com> Signed-off-by: Tony Camuso <tcam...@redhat.com> --- drivers/char/ipmi/ipmi_si_intf.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 55fe902..4cc72fa 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1230,14 +1230,14 @@ static int smi_start_processing(void *send_info, new_smi->intf = intf; - /* Try to claim any interrupts. */ - if (new_smi->irq_setup) - new_smi->irq_setup(new_smi); - /* Set up the timer that drives the interface. */ setup_timer(_smi->si_timer, smi_timeout, (long)new_smi); smi_mod_timer(new_smi, jiffies + SI_TIMEOUT_JIFFIES); + /* Try to claim any interrupts. */ + if (new_smi->irq_setup) + new_smi->irq_setup(new_smi); + /* * Check if the user forcefully enabled the daemon. */ -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] netxen_nic: use spin_[un]lock_bh around tx_clean_lock (2)
This patch should have been part of the previous patch having the same summary. See http://marc.info/?l=linux-kernel=143039470103795=2 Unfortunately, I didn't check to see where else this lock was used before submitting that patch. This should take care of it for netxen_nic, as I did a thorough search this time. To recap from the original patch; although testing this driver with DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled did not produce any traces, it would be more prudent in the case of tx_clean_lock to use _bh versions of spin_[un]lock, since this lock is manipulated in both the process and softirq contexts. This patch was tested for functionality and regressions with netperf and DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled. Signed-off-by: Tony Camuso --- drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c index 2da9627..6301bae 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c @@ -1766,7 +1766,7 @@ int netxen_process_cmd_ring(struct netxen_adapter *adapter) int done = 0; struct nx_host_tx_ring *tx_ring = adapter->tx_ring; - if (!spin_trylock(>tx_clean_lock)) + if (!spin_trylock_bh(>tx_clean_lock)) return 1; sw_consumer = tx_ring->sw_consumer; @@ -1821,7 +1821,7 @@ int netxen_process_cmd_ring(struct netxen_adapter *adapter) */ hw_consumer = le32_to_cpu(*(tx_ring->hw_consumer)); done = (sw_consumer == hw_consumer); - spin_unlock(>tx_clean_lock); + spin_unlock_bh(>tx_clean_lock); return done; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] netxen_nic: use spin_[un]lock_bh around tx_clean_lock (2)
This patch should have been part of the previous patch having the same summary. See http://marc.info/?l=linux-kernelm=143039470103795w=2 Unfortunately, I didn't check to see where else this lock was used before submitting that patch. This should take care of it for netxen_nic, as I did a thorough search this time. To recap from the original patch; although testing this driver with DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled did not produce any traces, it would be more prudent in the case of tx_clean_lock to use _bh versions of spin_[un]lock, since this lock is manipulated in both the process and softirq contexts. This patch was tested for functionality and regressions with netperf and DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled. Signed-off-by: Tony Camuso tcam...@redhat.com --- drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c index 2da9627..6301bae 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c @@ -1766,7 +1766,7 @@ int netxen_process_cmd_ring(struct netxen_adapter *adapter) int done = 0; struct nx_host_tx_ring *tx_ring = adapter-tx_ring; - if (!spin_trylock(adapter-tx_clean_lock)) + if (!spin_trylock_bh(adapter-tx_clean_lock)) return 1; sw_consumer = tx_ring-sw_consumer; @@ -1821,7 +1821,7 @@ int netxen_process_cmd_ring(struct netxen_adapter *adapter) */ hw_consumer = le32_to_cpu(*(tx_ring-hw_consumer)); done = (sw_consumer == hw_consumer); - spin_unlock(adapter-tx_clean_lock); + spin_unlock_bh(adapter-tx_clean_lock); return done; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] netxen_nic: use spin_[un]lock_bh around tx_clean_lock
While testing this driver with DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled did not produce any traces, it would be more prudent in the case of tx_clean_lock to use spin_[un]lock_bh, since this lock is manipulated in both the process and softirq contexts. This patch was tested for functionality and regressions with netperf and DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled. Signed-off-by: Tony Camuso --- drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c index d746c97..2da9627 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c @@ -137,7 +137,7 @@ void netxen_release_tx_buffers(struct netxen_adapter *adapter) int i, j; struct nx_host_tx_ring *tx_ring = adapter->tx_ring; - spin_lock(>tx_clean_lock); + spin_lock_bh(>tx_clean_lock); cmd_buf = tx_ring->cmd_buf_arr; for (i = 0; i < tx_ring->num_desc; i++) { buffrag = cmd_buf->frag_array; @@ -161,7 +161,7 @@ void netxen_release_tx_buffers(struct netxen_adapter *adapter) } cmd_buf++; } - spin_unlock(>tx_clean_lock); + spin_unlock_bh(>tx_clean_lock); } void netxen_free_sw_resources(struct netxen_adapter *adapter) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] netxen_nic: use spin_[un]lock_bh around tx_clean_lock
While testing this driver with DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled did not produce any traces, it would be more prudent in the case of tx_clean_lock to use spin_[un]lock_bh, since this lock is manipulated in both the process and softirq contexts. This patch was tested for functionality and regressions with netperf and DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled. Signed-off-by: Tony Camuso tcam...@redhat.com --- drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c index d746c97..2da9627 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c @@ -137,7 +137,7 @@ void netxen_release_tx_buffers(struct netxen_adapter *adapter) int i, j; struct nx_host_tx_ring *tx_ring = adapter-tx_ring; - spin_lock(adapter-tx_clean_lock); + spin_lock_bh(adapter-tx_clean_lock); cmd_buf = tx_ring-cmd_buf_arr; for (i = 0; i tx_ring-num_desc; i++) { buffrag = cmd_buf-frag_array; @@ -161,7 +161,7 @@ void netxen_release_tx_buffers(struct netxen_adapter *adapter) } cmd_buf++; } - spin_unlock(adapter-tx_clean_lock); + spin_unlock_bh(adapter-tx_clean_lock); } void netxen_free_sw_resources(struct netxen_adapter *adapter) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] drivers/acpi: acpi_ipmi.c replace mutex with spin_lock_irqsave
From: tcam...@redhat.com From: Tony Camuso We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. = [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 - inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (_device->tx_msg_lock){+.?...}, at: [] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [] __lock_acquire+0x63c/0x1570 [] lock_acquire+0xa4/0x120 [] __mutex_lock_common+0x4c/0x400 [] mutex_lock_nested+0x4a/0x60 [] acpi_ipmi_space_handler+0x11b/0x234 [] acpi_ev_address_space_dispatch+0x170/0x1be The ipmi_msg_handler() function in drivers/acpi/acpi_ipmi was taking a mutex_lock after ultimately being called from a call chain initiated by function smi_rcv_tasklet() in drivers/char/ipmi/ipmi_msghandler.c. Documentation/mutex-design.txt on lines 93 and 94 says, "mutexes may not be used in hardware or software interrupt contexts such as tasklets and timers." The patch changes the mutex member tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device to a spinlock_t and replaces mutex_lock/unlock pairs with spin_lock_irqsave/unlock_irqrestore around tx_msg_lock for accesses to that struct. Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Signed-off-by: Tony Camuso --- drivers/acpi/acpi_ipmi.c | 24 ++-- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index f40acef..a4b428d 100644 --- a/drivers/acpi/acpi_ipmi.c +++ b/drivers/acpi/acpi_ipmi.c @@ -57,7 +57,7 @@ struct acpi_ipmi_device { struct list_head head; /* the IPMI request message list */ struct list_head tx_msg_list; - struct mutextx_msg_lock; + spinlock_t tx_msg_lock; acpi_handle handle; struct pnp_dev *pnp_dev; ipmi_user_t user_interface; @@ -147,6 +147,7 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg, struct kernel_ipmi_msg *msg; struct acpi_ipmi_buffer *buffer; struct acpi_ipmi_device *device; + unsigned long flags; msg = _msg->tx_message; /* @@ -177,10 +178,10 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg, /* Get the msgid */ device = tx_msg->device; - mutex_lock(>tx_msg_lock); + spin_lock_irqsave(>tx_msg_lock, flags); device->curr_msgid++; tx_msg->tx_msgid = device->curr_msgid; - mutex_unlock(>tx_msg_lock); + spin_unlock_irqrestore(>tx_msg_lock, flags); } static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg, @@ -242,6 +243,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) int msg_found = 0; struct acpi_ipmi_msg *tx_msg; struct pnp_dev *pnp_dev = ipmi_device->pnp_dev; + unsigned long flags; if (msg->user != ipmi_device->user_interface) { dev_warn(_dev->dev, "Unexpected response is returned. " @@ -250,7 +252,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) ipmi_free_recv_msg(msg); return; } - mutex_lock(_device->tx_msg_lock); + spin_lock_irqsave(_device->tx_msg_lock, flags); list_for_each_entry(tx_msg, _device->tx_msg_list, head) { if (msg->msgid == tx_msg->tx_msgid) { msg_found = 1; @@ -258,7 +260,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) } } - mutex_unlock(_device->tx_msg_lock); + spin_unlock_irqrestore(_device->tx_msg_lock, flags); if (!msg_found) { dev_warn(_dev->dev, "Unexpected response (msg id %ld) is " "returned.\n", msg->msgid); @@ -378,6 +380,8 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address, struct acpi_ipmi_device *ipmi_device = handler_context; int err, rem_time; acpi_status status; + unsigned long flags; + /* * IPMI opregion message. * IPMI message is firstly written to the BMC and system software @@ -395,9 +399,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address, return AE_NO_MEMORY; acpi_format_ipmi_msg(tx_msg, address, value); - mutex_lock(_device->tx_msg_lock); + spin_lock_irqsave(_device->tx_msg_lock, flags); list_add_tail(_msg->
[PATCH 1/1] drivers/acpi: acpi_ipmi.c replace mutex with spin_lock_irqsave
From: tcam...@redhat.com From: Tony Camuso tcam...@redhat.com We were getting occasional Scheduling while atomic call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. = [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 - inconsistent {SOFTIRQ-ON-W} - {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (ipmi_device-tx_msg_lock){+.?...}, at: [81337a27] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [810ba11c] __lock_acquire+0x63c/0x1570 [810bb0f4] lock_acquire+0xa4/0x120 [815581cc] __mutex_lock_common+0x4c/0x400 [815586ea] mutex_lock_nested+0x4a/0x60 [8133789d] acpi_ipmi_space_handler+0x11b/0x234 [81321c62] acpi_ev_address_space_dispatch+0x170/0x1be The ipmi_msg_handler() function in drivers/acpi/acpi_ipmi was taking a mutex_lock after ultimately being called from a call chain initiated by function smi_rcv_tasklet() in drivers/char/ipmi/ipmi_msghandler.c. Documentation/mutex-design.txt on lines 93 and 94 says, mutexes may not be used in hardware or software interrupt contexts such as tasklets and timers. The patch changes the mutex member tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device to a spinlock_t and replaces mutex_lock/unlock pairs with spin_lock_irqsave/unlock_irqrestore around tx_msg_lock for accesses to that struct. Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Signed-off-by: Tony Camuso tcam...@redhat.com --- drivers/acpi/acpi_ipmi.c | 24 ++-- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index f40acef..a4b428d 100644 --- a/drivers/acpi/acpi_ipmi.c +++ b/drivers/acpi/acpi_ipmi.c @@ -57,7 +57,7 @@ struct acpi_ipmi_device { struct list_head head; /* the IPMI request message list */ struct list_head tx_msg_list; - struct mutextx_msg_lock; + spinlock_t tx_msg_lock; acpi_handle handle; struct pnp_dev *pnp_dev; ipmi_user_t user_interface; @@ -147,6 +147,7 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg, struct kernel_ipmi_msg *msg; struct acpi_ipmi_buffer *buffer; struct acpi_ipmi_device *device; + unsigned long flags; msg = tx_msg-tx_message; /* @@ -177,10 +178,10 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg, /* Get the msgid */ device = tx_msg-device; - mutex_lock(device-tx_msg_lock); + spin_lock_irqsave(device-tx_msg_lock, flags); device-curr_msgid++; tx_msg-tx_msgid = device-curr_msgid; - mutex_unlock(device-tx_msg_lock); + spin_unlock_irqrestore(device-tx_msg_lock, flags); } static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg, @@ -242,6 +243,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) int msg_found = 0; struct acpi_ipmi_msg *tx_msg; struct pnp_dev *pnp_dev = ipmi_device-pnp_dev; + unsigned long flags; if (msg-user != ipmi_device-user_interface) { dev_warn(pnp_dev-dev, Unexpected response is returned. @@ -250,7 +252,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) ipmi_free_recv_msg(msg); return; } - mutex_lock(ipmi_device-tx_msg_lock); + spin_lock_irqsave(ipmi_device-tx_msg_lock, flags); list_for_each_entry(tx_msg, ipmi_device-tx_msg_list, head) { if (msg-msgid == tx_msg-tx_msgid) { msg_found = 1; @@ -258,7 +260,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) } } - mutex_unlock(ipmi_device-tx_msg_lock); + spin_unlock_irqrestore(ipmi_device-tx_msg_lock, flags); if (!msg_found) { dev_warn(pnp_dev-dev, Unexpected response (msg id %ld) is returned.\n, msg-msgid); @@ -378,6 +380,8 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address, struct acpi_ipmi_device *ipmi_device = handler_context; int err, rem_time; acpi_status status; + unsigned long flags; + /* * IPMI opregion message. * IPMI message is firstly written to the BMC and system software @@ -395,9 +399,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address, return AE_NO_MEMORY; acpi_format_ipmi_msg(tx_msg, address, value); - mutex_lock(ipmi_device-tx_msg_lock); + spin_lock_irqsave(ipmi_device
Re: [PATCH] tpm: fix regression caused by section type conflict of tpm_dev_release() in ppc builds
On 05/27/2013 03:51 PM, Peter Huewe wrote: > The 8119807 commit reintroduced a regression > (error: __ksymtab_tpm_dev_release causes a section type conflict) that was > fixed by commit > cbb2ed4. > Fix it for good by adding the prototype to tpm.h so sparse doesn't > complain about it anymore. > > Reported-by: Tony Camuso > Signed-off-by: Peter Huewe > --- > James, can you please take this one directly and push it to next please? > As it causes a build failure on ppc > > drivers/char/tpm/tpm.c | 2 +- > drivers/char/tpm/tpm.h | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/char/tpm/tpm.c b/drivers/char/tpm/tpm.c > index 7c3b3dc..e3c974a 100644 > --- a/drivers/char/tpm/tpm.c > +++ b/drivers/char/tpm/tpm.c > @@ -1472,7 +1472,7 @@ EXPORT_SYMBOL_GPL(tpm_dev_vendor_release); > * Once all references to platform device are down to 0, > * release all allocated structures. > */ > -static void tpm_dev_release(struct device *dev) > +void tpm_dev_release(struct device *dev) > { > struct tpm_chip *chip = dev_get_drvdata(dev); > > diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h > index 4334232..a7bfc17 100644 > --- a/drivers/char/tpm/tpm.h > +++ b/drivers/char/tpm/tpm.h > @@ -332,6 +332,7 @@ extern struct tpm_chip* tpm_register_hardware(struct > device *, >const struct tpm_vendor_specific *); > extern int tpm_open(struct inode *, struct file *); > extern int tpm_release(struct inode *, struct file *); > +extern void tpm_dev_release(struct device *dev); > extern void tpm_dev_vendor_release(struct tpm_chip *); > extern ssize_t tpm_write(struct file *, const char __user *, size_t, >loff_t *); > Thanks, Peter. I should've mentioned that it didn't cause problems with x86 build, only with ppc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tpm: fix regression caused by section type conflict of tpm_dev_release() in ppc builds
On 05/27/2013 03:51 PM, Peter Huewe wrote: The 8119807 commit reintroduced a regression (error: __ksymtab_tpm_dev_release causes a section type conflict) that was fixed by commit cbb2ed4. Fix it for good by adding the prototype to tpm.h so sparse doesn't complain about it anymore. Reported-by: Tony Camuso tcam...@redhat.com Signed-off-by: Peter Huewe peterhu...@gmx.de --- James, can you please take this one directly and push it to next please? As it causes a build failure on ppc drivers/char/tpm/tpm.c | 2 +- drivers/char/tpm/tpm.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/char/tpm/tpm.c b/drivers/char/tpm/tpm.c index 7c3b3dc..e3c974a 100644 --- a/drivers/char/tpm/tpm.c +++ b/drivers/char/tpm/tpm.c @@ -1472,7 +1472,7 @@ EXPORT_SYMBOL_GPL(tpm_dev_vendor_release); * Once all references to platform device are down to 0, * release all allocated structures. */ -static void tpm_dev_release(struct device *dev) +void tpm_dev_release(struct device *dev) { struct tpm_chip *chip = dev_get_drvdata(dev); diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 4334232..a7bfc17 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -332,6 +332,7 @@ extern struct tpm_chip* tpm_register_hardware(struct device *, const struct tpm_vendor_specific *); extern int tpm_open(struct inode *, struct file *); extern int tpm_release(struct inode *, struct file *); +extern void tpm_dev_release(struct device *dev); extern void tpm_dev_vendor_release(struct tpm_chip *); extern ssize_t tpm_write(struct file *, const char __user *, size_t, loff_t *); Thanks, Peter. I should've mentioned that it didn't cause problems with x86 build, only with ppc. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ipmi: replace call to try_smi_init() with call to add_smi()
Rescinding. This hunk is already upstream. On 01/08/2013 03:12 PM, Tony Camuso wrote: Upstream commit 9e368fa0 added function ipmi_pnp_probe(), which calls try_smi_init() directly. try_smi_init() allocates resources for IPMI driver. However try_smi_init() can be called again, and the allocated resources can be unexpectedly released when the same IPMI resource information is found on both ACPI namespace and SMBIOS table. ipmi_pnp_probe() should call add_smi() instead of try_smi_init() For reference, see upstream commit ... 2407d77a1a013b88ee3b817f2b934e420e5376f5 Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_si_intf.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index afb89be..20d13c6 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -2223,7 +2223,7 @@ static int __devinit ipmi_pnp_probe(struct pnp_dev *dev, info->dev = >dev; pnp_set_drvdata(dev, info); -return try_smi_init(info); +return add_smi(info); err_free: kfree(info); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ipmi: replace call to try_smi_init() with call to add_smi()
Rescinding. This hunk is already upstream. On 01/08/2013 03:12 PM, Tony Camuso wrote: Upstream commit 9e368fa0 added function ipmi_pnp_probe(), which calls try_smi_init() directly. try_smi_init() allocates resources for IPMI driver. However try_smi_init() can be called again, and the allocated resources can be unexpectedly released when the same IPMI resource information is found on both ACPI namespace and SMBIOS table. ipmi_pnp_probe() should call add_smi() instead of try_smi_init() For reference, see upstream commit ... 2407d77a1a013b88ee3b817f2b934e420e5376f5 Signed-off-by: Tony Camuso tcam...@redhat.com --- drivers/char/ipmi/ipmi_si_intf.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index afb89be..20d13c6 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -2223,7 +2223,7 @@ static int __devinit ipmi_pnp_probe(struct pnp_dev *dev, info-dev = dev-dev; pnp_set_drvdata(dev, info); -return try_smi_init(info); +return add_smi(info); err_free: kfree(info); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ipmi: replace call to try_smi_init() with call to add_smi()
Upstream commit 9e368fa0 added function ipmi_pnp_probe(), which calls try_smi_init() directly. try_smi_init() allocates resources for IPMI driver. However try_smi_init() can be called again, and the allocated resources can be unexpectedly released when the same IPMI resource information is found on both ACPI namespace and SMBIOS table. ipmi_pnp_probe() should call add_smi() instead of try_smi_init() For reference, see upstream commit ... 2407d77a1a013b88ee3b817f2b934e420e5376f5 Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_si_intf.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index afb89be..20d13c6 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -2223,7 +2223,7 @@ static int __devinit ipmi_pnp_probe(struct pnp_dev *dev, info->dev = >dev; pnp_set_drvdata(dev, info); - return try_smi_init(info); + return add_smi(info); err_free: kfree(info); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ipmi: replace call to try_smi_init() with call to add_smi()
Upstream commit 9e368fa0 added function ipmi_pnp_probe(), which calls try_smi_init() directly. try_smi_init() allocates resources for IPMI driver. However try_smi_init() can be called again, and the allocated resources can be unexpectedly released when the same IPMI resource information is found on both ACPI namespace and SMBIOS table. ipmi_pnp_probe() should call add_smi() instead of try_smi_init() For reference, see upstream commit ... 2407d77a1a013b88ee3b817f2b934e420e5376f5 Signed-off-by: Tony Camuso tcam...@redhat.com --- drivers/char/ipmi/ipmi_si_intf.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index afb89be..20d13c6 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -2223,7 +2223,7 @@ static int __devinit ipmi_pnp_probe(struct pnp_dev *dev, info-dev = dev-dev; pnp_set_drvdata(dev, info); - return try_smi_init(info); + return add_smi(info); err_free: kfree(info); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ipmi: add new kernel options to prevent automatic ipmi init
RHEL builds the ipmi_si into the kernel by default, rather than as a module, because it is required early in order to be available for ACPI opregion access. However, it appears that some of our customers have custom ipmi drivers, and this gets in their way. Stratus is currently evaluating your suggestions, and we are expecting a response from them sometime today or early next week. On 12/13/2012 02:51 PM, Corey Minyard wrote: Well, the built-in driver works on systems that have more than one interface and more than one BMC, and multiple IPMBs (and all of the other channel types for that matter, and the driver handles all the multiplexing and nasty addressing). There is, in fact, no arbitrary limit, and IBM tested this fairly extensively with some of their systems. I'm not sure why you would need a custom driver, and if there are some custom things that need to be done for your servers, I'd be happy to add that. I've worked with a number of other vendors to get changes like this in. And then ipmitool, freeipmi, openipmi, etc. will work with the device. I don't have a big problem with this patch. I wonder why you would want to compile the standard driver into your kernel if you expected to load a module with a custom driver later, though. None of the distros I know of compile it in, it's always a module. You can also dynamically remove the device from the driver using the hot add/remove capability. To remove it, you can do: echo remove,`cat /proc/ipmi/0/params` and it will disassociate that device (IPMI interface 0 in this case) from the driver. So you can iterate through all the devices in /proc/ipmi and remove them all at startup. If none of the above options work for you, we can go ahead with this patch. Just wanted to let you know that current options exist, and see if you wanted to take a different direction. -corey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ipmi: add new kernel options to prevent automatic ipmi init
RHEL builds the ipmi_si into the kernel by default, rather than as a module, because it is required early in order to be available for ACPI opregion access. However, it appears that some of our customers have custom ipmi drivers, and this gets in their way. Stratus is currently evaluating your suggestions, and we are expecting a response from them sometime today or early next week. On 12/13/2012 02:51 PM, Corey Minyard wrote: Well, the built-in driver works on systems that have more than one interface and more than one BMC, and multiple IPMBs (and all of the other channel types for that matter, and the driver handles all the multiplexing and nasty addressing). There is, in fact, no arbitrary limit, and IBM tested this fairly extensively with some of their systems. I'm not sure why you would need a custom driver, and if there are some custom things that need to be done for your servers, I'd be happy to add that. I've worked with a number of other vendors to get changes like this in. And then ipmitool, freeipmi, openipmi, etc. will work with the device. I don't have a big problem with this patch. I wonder why you would want to compile the standard driver into your kernel if you expected to load a module with a custom driver later, though. None of the distros I know of compile it in, it's always a module. You can also dynamically remove the device from the driver using the hot add/remove capability. To remove it, you can do: echo remove,`cat /proc/ipmi/0/params` and it will disassociate that device (IPMI interface 0 in this case) from the driver. So you can iterate through all the devices in /proc/ipmi and remove them all at startup. If none of the above options work for you, we can go ahead with this patch. Just wanted to let you know that current options exist, and see if you wanted to take a different direction. -corey -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ipmi: add new kernel options to prevent automatic ipmi init
The configuration change building ipmi_si into the kernel precludes the use of a custom driver that can utilize more than one KCS interface, multiple IPMBs, and more than one BMC. This capability is important for fault-tolerant systems. Even if the kernel option ipmi_si.trydefaults=0 is specified, ipmi_si discovers and claims one of the KCS interfaces on a Stratus server. The inability to now prevent the kernel from managing this device is a regression from previous kernels. The regression breaks a capability fault-tolerant vendors have relied upon. To support both ACPI opregion access and the need to avoid activation of ipmi_si on some platforms, we've added two new kernel options, ipmi_si.tryacpi and ipmi_si.trydmi be added to prevent ipmi_si from initializing when these options are set to 0 on the kernel command line. With these options at the default value of 1, ipmi_si init proceeds according to the kernel default. Tested-by: Jim Paradis Signed-off-by: Robert Evans Signed-off-by: Jim Paradis Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_si_intf.c | 28 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 20ab5b3..0a441cf 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1208,6 +1208,12 @@ static int smi_num; /* Used to sequence the SMIs */ #define DEFAULT_REGSPACING 1 #define DEFAULT_REGSIZE1 +#ifdef CONFIG_ACPI +static int si_tryacpi = 1; +#endif +#ifdef CONFIG_DMI +static int si_trydmi = 1; +#endif static bool si_trydefaults = 1; static char *si_type[SI_MAX_PARMS]; #define MAX_SI_TYPE_STR 30 @@ -1238,6 +1244,16 @@ MODULE_PARM_DESC(hotmod, "Add and remove interfaces. See" " Documentation/IPMI.txt in the kernel sources for the" " gory details."); +#ifdef CONFIG_ACPI +module_param_named(tryacpi, si_tryacpi, bool, 0); +MODULE_PARM_DESC(tryacpi, "Setting this to zero will disable the" +" default scan of the interfaces identified via ACPI"); +#endif +#ifdef CONFIG_DMI +module_param_named(trydmi, si_trydmi, bool, 0); +MODULE_PARM_DESC(trydmi, "Setting this to zero will disable the" +" default scan of the interfaces identified via DMI"); +#endif module_param_named(trydefaults, si_trydefaults, bool, 0); MODULE_PARM_DESC(trydefaults, "Setting this to 'false' will disable the" " default scan of the KCS and SMIC interface at the standard" @@ -3408,16 +3424,20 @@ static int init_ipmi_si(void) #endif #ifdef CONFIG_ACPI - pnp_register_driver(_pnp_driver); - pnp_registered = 1; + if (si_tryacpi) { + pnp_register_driver(_pnp_driver); + pnp_registered = 1; + } #endif #ifdef CONFIG_DMI - dmi_find_bmc(); + if (si_trydmi) + dmi_find_bmc(); #endif #ifdef CONFIG_ACPI - spmi_find_bmc(); + if (si_tryacpi) + spmi_find_bmc(); #endif /* We prefer devices with interrupts, but in the case of a machine -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ipmi: add new kernel options to prevent automatic ipmi init
The configuration change building ipmi_si into the kernel precludes the use of a custom driver that can utilize more than one KCS interface, multiple IPMBs, and more than one BMC. This capability is important for fault-tolerant systems. Even if the kernel option ipmi_si.trydefaults=0 is specified, ipmi_si discovers and claims one of the KCS interfaces on a Stratus server. The inability to now prevent the kernel from managing this device is a regression from previous kernels. The regression breaks a capability fault-tolerant vendors have relied upon. To support both ACPI opregion access and the need to avoid activation of ipmi_si on some platforms, we've added two new kernel options, ipmi_si.tryacpi and ipmi_si.trydmi be added to prevent ipmi_si from initializing when these options are set to 0 on the kernel command line. With these options at the default value of 1, ipmi_si init proceeds according to the kernel default. Tested-by: Jim Paradis jpara...@redhat.com Signed-off-by: Robert Evans robert.ev...@stratus.com Signed-off-by: Jim Paradis jpara...@redhat.com Signed-off-by: Tony Camuso tcam...@redhat.com --- drivers/char/ipmi/ipmi_si_intf.c | 28 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 20ab5b3..0a441cf 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1208,6 +1208,12 @@ static int smi_num; /* Used to sequence the SMIs */ #define DEFAULT_REGSPACING 1 #define DEFAULT_REGSIZE1 +#ifdef CONFIG_ACPI +static int si_tryacpi = 1; +#endif +#ifdef CONFIG_DMI +static int si_trydmi = 1; +#endif static bool si_trydefaults = 1; static char *si_type[SI_MAX_PARMS]; #define MAX_SI_TYPE_STR 30 @@ -1238,6 +1244,16 @@ MODULE_PARM_DESC(hotmod, Add and remove interfaces. See Documentation/IPMI.txt in the kernel sources for the gory details.); +#ifdef CONFIG_ACPI +module_param_named(tryacpi, si_tryacpi, bool, 0); +MODULE_PARM_DESC(tryacpi, Setting this to zero will disable the + default scan of the interfaces identified via ACPI); +#endif +#ifdef CONFIG_DMI +module_param_named(trydmi, si_trydmi, bool, 0); +MODULE_PARM_DESC(trydmi, Setting this to zero will disable the + default scan of the interfaces identified via DMI); +#endif module_param_named(trydefaults, si_trydefaults, bool, 0); MODULE_PARM_DESC(trydefaults, Setting this to 'false' will disable the default scan of the KCS and SMIC interface at the standard @@ -3408,16 +3424,20 @@ static int init_ipmi_si(void) #endif #ifdef CONFIG_ACPI - pnp_register_driver(ipmi_pnp_driver); - pnp_registered = 1; + if (si_tryacpi) { + pnp_register_driver(ipmi_pnp_driver); + pnp_registered = 1; + } #endif #ifdef CONFIG_DMI - dmi_find_bmc(); + if (si_trydmi) + dmi_find_bmc(); #endif #ifdef CONFIG_ACPI - spmi_find_bmc(); + if (si_tryacpi) + spmi_find_bmc(); #endif /* We prefer devices with interrupts, but in the case of a machine -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Change pci_raw_ops to pci_raw_read/write
Arjan van de Ven wrote: On Thu, 07 Feb 2008 10:54:05 -0500 Tony Camuso <[EMAIL PROTECTED]> wrote: Matthew, Perhaps I missed it, but did you address Yinghai's concerns? Yinghai Lu wrote: On Jan 28, 2008 7:03 PM, Matthew Wilcox <[EMAIL PROTECTED]> wrote: -int pci_conf1_write(unsigned int seg, unsigned int bus, +static int pci_conf1_write(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 value) any reason to change pci_conf1_read/write to static? nothing should use these directly. So static is the right answer ;) Agreed. Thanks, Arjan. Matthew, What about the ATA_RAM addition to Kconfig? Was it accidental, or intended? If intended, how is it related? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Change pci_raw_ops to pci_raw_read/write
Matthew, Perhaps I missed it, but did you address Yinghai's concerns? Yinghai Lu wrote: On Jan 28, 2008 7:03 PM, Matthew Wilcox <[EMAIL PROTECTED]> wrote: -int pci_conf1_write(unsigned int seg, unsigned int bus, +static int pci_conf1_write(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 value) any reason to change pci_conf1_read/write to static? +config ATA_RAM + tristate "ATA RAM driver" + related? YH -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Matthew Wilcox wrote: On Tue, Jan 29, 2008 at 07:29:51AM -0800, Arjan van de Ven wrote: Right now, that isn't a lot of people in x86 land, but your patch encumbers drivers for non-x86 archs with an additional call to access space that they've never had a problem with. lets say s/x86/x86, IA64 and architectures that use intel, amd or via chipsets/ Umm .. ia64 already does exactly what I'm proposing for x86. It uses one SAL interface for bytes below 256 and a different SAL interface for bytes 256-4095. Not exactly. :) The interface is the same, ia64_sal_pci_config_write() and ia64_sal_pci_config_read(), but a flag bit in the mode argument is used to tell the SAL interface whether to translate the offset component of the config address as having 8 or 12 bits of of displacement. In my estimation, Ivan's patch, in his implementation of Loic's suggestion, is even more elegant, since there is no need to flag whether the access is for offsets below 256. Ivan's code automatically uses Port IO (or equivalent with Matthew's patch) for offsets below 256 and MMCONFIG for offsets from 256 to 4096. And even better, it removes the bitmap that tracks MMCONFIG-unfriendly devices for the first 16 buses, a solution that assumes systems with bus numbers higher than 16 will get MMCONFIG right, which turned out to be a very wrong assumption. Furthermore, the config address is translated by the Northbridge. The delivery mechanism to the Northbridge, whether Port IO or MMCONFIG, is utterly opaque to the devices on the bus, since all they see is PCI config cycles, not Port IO or MMCONFIG cycles. The test only needed to be made at the Northbridge level, not at the device level. Ivan's patch removes all this cruft. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: On Tue, 29 Jan 2008 10:15:45 -0500 Tony Camuso <[EMAIL PROTECTED]> wrote: specific to legacy x86 hardware is, IMNSHO, a kludge. in addition to pci_enable(), pci_enable_msi(), pci_enable_busmaster() they already need to do to enable various features? These calls are related to generic aspects of the PCI* landscape itself and are not related to any arch-specific hardware, nor were they devised to address chipset-specific or BIOS-specific problems. For the good of all, we should endeavor to avoid putting arch-specific fixes into the generic code whenever possible. And in this case, not only is it possible, it's been done and tested. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: On Tue, 29 Jan 2008 09:15:02 -0500 Tony Camuso <[EMAIL PROTECTED]> wrote: Greg, The problem with Arjan's patch, if I understand it correctly, is that it requires drivers to make a call to access extended PCI config space. And, IIRC, Arjan's patch encumbers drivers for all arch's, even those that have no MMCONFIG problems. The patches proposed by Loic, Ivan, Matthew, and myself, all address the problem in an x86-specific manner that is transparent to the drivers. this is not quite correct; the patches from Loic, Ivan, Matthew and you are for a different problem statement. Your patch problem statement is "need to fix mmconfig", my patch problem statement is "need to not make users who don't need it suffer". These are orthogonal problems. Yes, but your patch also makes users who need extended PCI config space suffer. Right now, that isn't a lot of people in x86 land, but your patch encumbers drivers for non-x86 archs with an additional call to access space that they've never had a problem with. As more PCI express drivers start to take advantage of AER and other advanced express capabilities, the extra call to address a condition specific to legacy x86 hardware is, IMNSHO, a kludge. The patches submitted by the others fix the problems with MMCONFIG without encumbering the drivers to be aware of any difference between legacy config space and extended config space. I have tested these patches on a number of systems exhibiting various MMCONFIG- related pathologies, and they work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Greg KH wrote: On Mon, Jan 28, 2008 at 08:18:04PM -0700, Matthew Wilcox wrote: I'm more optimistic because we've so severely restricted the use of mmconf after these patches that it's unlikely to cause problems. I also hear Vista is now using mmconf, so fewer implementations are going to be buggy at this point. Hahahaha, oh, that's a good one... But what about the thousands of implementations out there that are buggy? I'm with Arjan here, I'm very skeptical. Matthew, with Arjan's patch, is anything that currently works now broken? Why do you feel it is somehow "wrong"? thanks, greg k-h Greg, The problem with Arjan's patch, if I understand it correctly, is that it requires drivers to make a call to access extended PCI config space. And, IIRC, Arjan's patch encumbers drivers for all arch's, even those that have no MMCONFIG problems. The patches proposed by Loic, Ivan, Matthew, and myself, all address the problem in an x86-specific manner that is transparent to the drivers. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: On Tue, 29 Jan 2008 10:15:45 -0500 Tony Camuso [EMAIL PROTECTED] wrote: specific to legacy x86 hardware is, IMNSHO, a kludge. in addition to pci_enable(), pci_enable_msi(), pci_enable_busmaster() they already need to do to enable various features? These calls are related to generic aspects of the PCI* landscape itself and are not related to any arch-specific hardware, nor were they devised to address chipset-specific or BIOS-specific problems. For the good of all, we should endeavor to avoid putting arch-specific fixes into the generic code whenever possible. And in this case, not only is it possible, it's been done and tested. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Greg KH wrote: On Mon, Jan 28, 2008 at 08:18:04PM -0700, Matthew Wilcox wrote: I'm more optimistic because we've so severely restricted the use of mmconf after these patches that it's unlikely to cause problems. I also hear Vista is now using mmconf, so fewer implementations are going to be buggy at this point. Hahahaha, oh, that's a good one... But what about the thousands of implementations out there that are buggy? I'm with Arjan here, I'm very skeptical. Matthew, with Arjan's patch, is anything that currently works now broken? Why do you feel it is somehow wrong? thanks, greg k-h Greg, The problem with Arjan's patch, if I understand it correctly, is that it requires drivers to make a call to access extended PCI config space. And, IIRC, Arjan's patch encumbers drivers for all arch's, even those that have no MMCONFIG problems. The patches proposed by Loic, Ivan, Matthew, and myself, all address the problem in an x86-specific manner that is transparent to the drivers. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: On Tue, 29 Jan 2008 09:15:02 -0500 Tony Camuso [EMAIL PROTECTED] wrote: Greg, The problem with Arjan's patch, if I understand it correctly, is that it requires drivers to make a call to access extended PCI config space. And, IIRC, Arjan's patch encumbers drivers for all arch's, even those that have no MMCONFIG problems. The patches proposed by Loic, Ivan, Matthew, and myself, all address the problem in an x86-specific manner that is transparent to the drivers. this is not quite correct; the patches from Loic, Ivan, Matthew and you are for a different problem statement. Your patch problem statement is need to fix mmconfig, my patch problem statement is need to not make users who don't need it suffer. These are orthogonal problems. Yes, but your patch also makes users who need extended PCI config space suffer. Right now, that isn't a lot of people in x86 land, but your patch encumbers drivers for non-x86 archs with an additional call to access space that they've never had a problem with. As more PCI express drivers start to take advantage of AER and other advanced express capabilities, the extra call to address a condition specific to legacy x86 hardware is, IMNSHO, a kludge. The patches submitted by the others fix the problems with MMCONFIG without encumbering the drivers to be aware of any difference between legacy config space and extended config space. I have tested these patches on a number of systems exhibiting various MMCONFIG- related pathologies, and they work. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Matthew Wilcox wrote: On Tue, Jan 29, 2008 at 07:29:51AM -0800, Arjan van de Ven wrote: Right now, that isn't a lot of people in x86 land, but your patch encumbers drivers for non-x86 archs with an additional call to access space that they've never had a problem with. lets say s/x86/x86, IA64 and architectures that use intel, amd or via chipsets/ Umm .. ia64 already does exactly what I'm proposing for x86. It uses one SAL interface for bytes below 256 and a different SAL interface for bytes 256-4095. Not exactly. :) The interface is the same, ia64_sal_pci_config_write() and ia64_sal_pci_config_read(), but a flag bit in the mode argument is used to tell the SAL interface whether to translate the offset component of the config address as having 8 or 12 bits of of displacement. In my estimation, Ivan's patch, in his implementation of Loic's suggestion, is even more elegant, since there is no need to flag whether the access is for offsets below 256. Ivan's code automatically uses Port IO (or equivalent with Matthew's patch) for offsets below 256 and MMCONFIG for offsets from 256 to 4096. And even better, it removes the bitmap that tracks MMCONFIG-unfriendly devices for the first 16 buses, a solution that assumes systems with bus numbers higher than 16 will get MMCONFIG right, which turned out to be a very wrong assumption. Furthermore, the config address is translated by the Northbridge. The delivery mechanism to the Northbridge, whether Port IO or MMCONFIG, is utterly opaque to the devices on the bus, since all they see is PCI config cycles, not Port IO or MMCONFIG cycles. The test only needed to be made at the Northbridge level, not at the device level. Ivan's patch removes all this cruft. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Greg, Have you given Grant's suggestion any further consideration? I'd like to know how the MMCONFIG issues discussed in this thread are going to be handled upstream. I have a patch implemented in RHEL 5.2, but I would rather have the upstream patch implemented, whatever it is. Grant Grundler wrote: On Tue, Jan 15, 2008 at 10:56:41AM -0700, Matthew Wilcox wrote: On Tue, Jan 15, 2008 at 09:46:43AM -0800, Greg KH wrote: But so far, we have a zillion patches floating around, claiming different things, some with signed-off-bys and others without, so for now, I'll just stick with Arjan's patch in -mm and see if anyone complains about those releases... I complain about Arjan's patch. For reasons which have been adequately gone into already in this thread. Agreed. Greg, I think at least two better alternatives were proposed already. Please review the thread again. grant -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Greg, Have you given Grant's suggestion any further consideration? I'd like to know how the MMCONFIG issues discussed in this thread are going to be handled upstream. I have a patch implemented in RHEL 5.2, but I would rather have the upstream patch implemented, whatever it is. Grant Grundler wrote: On Tue, Jan 15, 2008 at 10:56:41AM -0700, Matthew Wilcox wrote: On Tue, Jan 15, 2008 at 09:46:43AM -0800, Greg KH wrote: But so far, we have a zillion patches floating around, claiming different things, some with signed-off-bys and others without, so for now, I'll just stick with Arjan's patch in -mm and see if anyone complains about those releases... I complain about Arjan's patch. For reasons which have been adequately gone into already in this thread. Agreed. Greg, I think at least two better alternatives were proposed already. Please review the thread again. grant -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
I agree with Matthew. My preference is Ivan's patch using Loic's proposal. My patch would have tested MMCONFIG before using it, but it didn't fix the problem where the decode of large displacement devices can overlap the MMCONFIG region. Ivan's patch fixes that, and the problem of Northbridges that don't respond to MMCONFIG and as a bonus cleans out some code rendered unnecessary by his patch. Linus is confident that conf1 is not going away for at least the next five years. Matthew Wilcox wrote: On Tue, Jan 15, 2008 at 09:46:43AM -0800, Greg KH wrote: But so far, we have a zillion patches floating around, claiming different things, some with signed-off-bys and others without, so for now, I'll just stick with Arjan's patch in -mm and see if anyone complains about those releases... I complain about Arjan's patch. For reasons which have been adequately gone into already in this thread. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
I agree with Matthew. My preference is Ivan's patch using Loic's proposal. My patch would have tested MMCONFIG before using it, but it didn't fix the problem where the decode of large displacement devices can overlap the MMCONFIG region. Ivan's patch fixes that, and the problem of Northbridges that don't respond to MMCONFIG and as a bonus cleans out some code rendered unnecessary by his patch. Linus is confident that conf1 is not going away for at least the next five years. Matthew Wilcox wrote: On Tue, Jan 15, 2008 at 09:46:43AM -0800, Greg KH wrote: But so far, we have a zillion patches floating around, claiming different things, some with signed-off-bys and others without, so for now, I'll just stick with Arjan's patch in -mm and see if anyone complains about those releases... I complain about Arjan's patch. For reasons which have been adequately gone into already in this thread. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: it's not pci_enable_mmconf(), it's pci_enable_extended_config_space... it's independent of the mechanism! Arjan, you would be foisting this call on device drivers running on arches that don't need any such distinction between extended config space and < 256 bytes. I still think it's a bad policy. Let's endeavor to confine arch-specific quirks to the arch-specific code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: On Mon, 14 Jan 2008 08:01:01 -0500 Tony Camuso <[EMAIL PROTECTED]> wrote: >> If we're going to differentiate MMCONFIG from some other access mechanism, it only needs to be done at the Northbridge level. Devices are electrically ignorant of the protocol used between CPU and Northbridge to get the Northbridge to assert config cycles on the bus. Again this is about having systems that don't need extended config space not use it. At all. The only way to do that is have the drivers say they need it, and not use it otherwise. It has NOTHING to do with how things are wired up. It's pure a kernel level policy decision about whether to use extended config space AT ALL. The problem with compelling device drivers to determine the PCI config mechanism is that it must be forced upon arches that have no PCI configuration quirks or don't even use the same PCI config mechanisms as x86. I don't think that's a good policy. Better to confine arch-specific quirks to the arch-specific code whenever possible. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: On Sun, 13 Jan 2008 22:29:23 -0500 Tony Camuso <[EMAIL PROTECTED]> wrote: . There is no need to provide different PCI config access mechanisms at device granularity, since the PCI config access mechanism between the CPU and the Northbridge is opaque to the devices. PCI config mechanisms only need to differ at the Northbridge level. This ignores the "lets make it not matter for the 99% of the users" case. I don't understand. If we're going to differentiate MMCONFIG from some other access mechanism, it only needs to be done at the Northbridge level. Devices are electrically ignorant of the protocol used between CPU and Northbridge to get the Northbridge to assert config cycles on the bus. . If the system is capable of conf1, then PCI config access at offsets < 256 should be confined to conf1. This solution is most effective for existing and legacy systems. not "conf1" but "what the platform thinks is the best method for < 256". We have this nice abstraction for the platform to select the best method... we should use it. Agreed. So we have Loic and Ivan's patch limiting MMCONFIG accesses to offsets >= 256. And we have Matthew's patch that abstracts the method for config accesses to offsets < 256. I beleive Matthew has already tested these patches for functionality on x86. All that's needed is to test for regressions on other arches. Is there any interest in providing the following? 1. The ability to use MMCONFIG for all accesses on systems that have no problems with MMCONFIG. 2. For systems using both PCI and PCI express, testing each bus for MMCONFIG compliance, to determine whether MMCONFIG can be used for all config accesses or whether the bus must be limited all to the method abstracted for offsets < 256. Or does that introduce unnecessary complications? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: On Sun, 13 Jan 2008 22:29:23 -0500 Tony Camuso [EMAIL PROTECTED] wrote: . There is no need to provide different PCI config access mechanisms at device granularity, since the PCI config access mechanism between the CPU and the Northbridge is opaque to the devices. PCI config mechanisms only need to differ at the Northbridge level. This ignores the lets make it not matter for the 99% of the users case. I don't understand. If we're going to differentiate MMCONFIG from some other access mechanism, it only needs to be done at the Northbridge level. Devices are electrically ignorant of the protocol used between CPU and Northbridge to get the Northbridge to assert config cycles on the bus. . If the system is capable of conf1, then PCI config access at offsets 256 should be confined to conf1. This solution is most effective for existing and legacy systems. not conf1 but what the platform thinks is the best method for 256. We have this nice abstraction for the platform to select the best method... we should use it. Agreed. So we have Loic and Ivan's patch limiting MMCONFIG accesses to offsets = 256. And we have Matthew's patch that abstracts the method for config accesses to offsets 256. I beleive Matthew has already tested these patches for functionality on x86. All that's needed is to test for regressions on other arches. Is there any interest in providing the following? 1. The ability to use MMCONFIG for all accesses on systems that have no problems with MMCONFIG. 2. For systems using both PCI and PCI express, testing each bus for MMCONFIG compliance, to determine whether MMCONFIG can be used for all config accesses or whether the bus must be limited all to the method abstracted for offsets 256. Or does that introduce unnecessary complications? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: it's not pci_enable_mmconf(), it's pci_enable_extended_config_space... it's independent of the mechanism! Arjan, you would be foisting this call on device drivers running on arches that don't need any such distinction between extended config space and 256 bytes. I still think it's a bad policy. Let's endeavor to confine arch-specific quirks to the arch-specific code. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
To all ... Well, here is what I perceive we've got so far. . Some PCI Northbridges do not work with MMCONFIG. . Some PCI BARs can overlap the MMCONFIG area during bus sizing. It is hoped that new BIOSes will locate MMCONFIG in an area safely out of the way of bus sizing code, but there can be no guarantees. . conf1 is going away in newer x86 implementations in the not too distant future. . The PCI express spec requires platforms to provide access to the extended config area, and there are express devices today using that area for AER. . There is no need to provide different PCI config access mechanisms at device granularity, since the PCI config access mechanism between the CPU and the Northbridge is opaque to the devices. PCI config mechanisms only need to differ at the Northbridge level. . We have a flurry of patches all claiming to solve all or some of these problems. Arjan, I realize it may not be possible for you to answer this question, but I feel compelled to ask it anyway. Is it possible that future x86 architectures will be implementing a SAL-like interface to abstract PCI config access altogether? Or can we condense these patches down to a set that does the following? . If the system is capable of conf1, then PCI config access at offsets < 256 should be confined to conf1. This solution is most effective for existing and legacy systems. . If the system does not support MMCONFIG, of if MMCONFIG is not working, then accesses to offsets > 256 return -1 and an error status. . For systems, where the conf1 mechanism is NOT available, then MMCONFIG should be the PCI access mechanism for all offsets. For such systems, we must assume that the BIOS has become smart enough to locate MMCONFIG in a region safe from encroachment by bus sizing code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: The PCI spec provides for conf1 as an architected solution. It's not going away, and especially not in x86 land where Port IO is built-in to the CPU. again sadly you're wrong. As someone gently pointed out to me, you are in a position to know this, so I probably am wrong. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: On Sat, 12 Jan 2008 20:36:59 -0500 Tony Camuso <[EMAIL PROTECTED]> wrote: Just about NOBODY has devices that need the extended config space. At all. The PCI express spec requires the platform to provide access to this space for express-compliance. More devices will be using this space as express becomes the dominant IO bus technology. As far as the device is concerned, after the Northbridge translates the config access into PCI bus cycles, the device has no idea what mechanism drove the Northbridge to the translation. Wanne bet there'll be devices that screw this up? THere's devices that even screwed up the 64-256 region after all. There may have been devices that incorrectly applied the PCI spec to various fields in the header, I'll grant you that. However, there is no way a device can determine electrically whether the Northbridge received Port IO or MMCONFIG cycles. This is between the CPU and the Northbridge and is utterly opaque to the devices on the bus. The patch I devised concerned itself with Northbridges and separated MMCONFIG-compliant buses from those that could not handle MMCONFIG. THis kind of patchup has been going on for the better part of a year (well 2 years) by now and it's STILL NOT ENOUGH, as you can see by the more patchups that have been proposed as "alternative" to my approach. Which is why Loic's proposal and Ivan's implementation of it is so elegant. It solves all these problems in one sweep, and eliminates the code rendered cruft by Ivan's patch. A two-fer, by my reckoning. In other words, for x86, I don't think we need to worry about Port IO config access ever going away at all. You're wrong there. Sad to say, but you're wrong there. The PCI spec provides for conf1 as an architected solution. It's not going away, and especially not in x86 land where Port IO is built-in to the CPU. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: On Sat, 12 Jan 2008 20:36:59 -0500 Tony Camuso [EMAIL PROTECTED] wrote: Just about NOBODY has devices that need the extended config space. At all. The PCI express spec requires the platform to provide access to this space for express-compliance. More devices will be using this space as express becomes the dominant IO bus technology. As far as the device is concerned, after the Northbridge translates the config access into PCI bus cycles, the device has no idea what mechanism drove the Northbridge to the translation. Wanne bet there'll be devices that screw this up? THere's devices that even screwed up the 64-256 region after all. There may have been devices that incorrectly applied the PCI spec to various fields in the header, I'll grant you that. However, there is no way a device can determine electrically whether the Northbridge received Port IO or MMCONFIG cycles. This is between the CPU and the Northbridge and is utterly opaque to the devices on the bus. The patch I devised concerned itself with Northbridges and separated MMCONFIG-compliant buses from those that could not handle MMCONFIG. THis kind of patchup has been going on for the better part of a year (well 2 years) by now and it's STILL NOT ENOUGH, as you can see by the more patchups that have been proposed as alternative to my approach. Which is why Loic's proposal and Ivan's implementation of it is so elegant. It solves all these problems in one sweep, and eliminates the code rendered cruft by Ivan's patch. A two-fer, by my reckoning. In other words, for x86, I don't think we need to worry about Port IO config access ever going away at all. You're wrong there. Sad to say, but you're wrong there. The PCI spec provides for conf1 as an architected solution. It's not going away, and especially not in x86 land where Port IO is built-in to the CPU. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan van de Ven wrote: The PCI spec provides for conf1 as an architected solution. It's not going away, and especially not in x86 land where Port IO is built-in to the CPU. again sadly you're wrong. As someone gently pointed out to me, you are in a position to know this, so I probably am wrong. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
To all ... Well, here is what I perceive we've got so far. . Some PCI Northbridges do not work with MMCONFIG. . Some PCI BARs can overlap the MMCONFIG area during bus sizing. It is hoped that new BIOSes will locate MMCONFIG in an area safely out of the way of bus sizing code, but there can be no guarantees. . conf1 is going away in newer x86 implementations in the not too distant future. . The PCI express spec requires platforms to provide access to the extended config area, and there are express devices today using that area for AER. . There is no need to provide different PCI config access mechanisms at device granularity, since the PCI config access mechanism between the CPU and the Northbridge is opaque to the devices. PCI config mechanisms only need to differ at the Northbridge level. . We have a flurry of patches all claiming to solve all or some of these problems. Arjan, I realize it may not be possible for you to answer this question, but I feel compelled to ask it anyway. Is it possible that future x86 architectures will be implementing a SAL-like interface to abstract PCI config access altogether? Or can we condense these patches down to a set that does the following? . If the system is capable of conf1, then PCI config access at offsets 256 should be confined to conf1. This solution is most effective for existing and legacy systems. . If the system does not support MMCONFIG, of if MMCONFIG is not working, then accesses to offsets 256 return -1 and an error status. . For systems, where the conf1 mechanism is NOT available, then MMCONFIG should be the PCI access mechanism for all offsets. For such systems, we must assume that the BIOS has become smart enough to locate MMCONFIG in a region safe from encroachment by bus sizing code. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Thanks, Arjan. The problem we have been experiencing has to do with Northbridges, not with devices. As far as the device is concerned, after the Northbridge translates the config access into PCI bus cycles, the device has no idea what mechanism drove the Northbridge to the translation. That is to say, the device does not know whether the config cycle on the bus was caused by an MMCONFIG cycle or a legacy Port IO cycle delivered to the Northbridge. In systems that had Northbridges that did not respond correctly to MMCONFIG cycles, like the AMD 8132, we (HP & RH) were blacklisting whole platforms to limit them to Port IO PCI config. However, when platforms emerged using both legacy PCI and PCI express, the platforms that were limited to Port IO config cycles were not express compliant, since the express spec requires the platform to be able to address the full 4096 byte region of config space to be considered express-compliant. The patch I devised concerned itself with Northbridges and separated MMCONFIG-compliant buses from those that could not handle MMCONFIG. Therefore, the express bus in the platform could happily employ MMCONFIG to access the entire 4K region, while the legacy bus with the non-compliant Northbridge could be restricted to Port IO config. However, even with my patch, the problem remained where devices requiring large displacements could overlap the BIOS-mapped MMCONFIG region. In such a situation, where the bus has passed the MMCONFIG test, the MMCONFIG region can get doubly mapped by bus-sizing code, causing the system to hang. The remedy proposed by Loic and implemented by Ivan is actually quite elegant, in that it addresses all these problems quite effectively while eliminating a ration of specialized and somewhat obscure code. In my humble opinion, Port IO config access is here to stay, having been defined as an architected mechanism in the PCI 2.1 spec. This is most especially true for x86. In other words, for x86, I don't think we need to worry about Port IO config access ever going away at all. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Arjan, I have not seen your MMCONFIG patch. Would you mind sending me a copy? Thanks. Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
Thanks, Arjan. The problem we have been experiencing has to do with Northbridges, not with devices. As far as the device is concerned, after the Northbridge translates the config access into PCI bus cycles, the device has no idea what mechanism drove the Northbridge to the translation. That is to say, the device does not know whether the config cycle on the bus was caused by an MMCONFIG cycle or a legacy Port IO cycle delivered to the Northbridge. In systems that had Northbridges that did not respond correctly to MMCONFIG cycles, like the AMD 8132, we (HP RH) were blacklisting whole platforms to limit them to Port IO PCI config. However, when platforms emerged using both legacy PCI and PCI express, the platforms that were limited to Port IO config cycles were not express compliant, since the express spec requires the platform to be able to address the full 4096 byte region of config space to be considered express-compliant. The patch I devised concerned itself with Northbridges and separated MMCONFIG-compliant buses from those that could not handle MMCONFIG. Therefore, the express bus in the platform could happily employ MMCONFIG to access the entire 4K region, while the legacy bus with the non-compliant Northbridge could be restricted to Port IO config. However, even with my patch, the problem remained where devices requiring large displacements could overlap the BIOS-mapped MMCONFIG region. In such a situation, where the bus has passed the MMCONFIG test, the MMCONFIG region can get doubly mapped by bus-sizing code, causing the system to hang. The remedy proposed by Loic and implemented by Ivan is actually quite elegant, in that it addresses all these problems quite effectively while eliminating a ration of specialized and somewhat obscure code. In my humble opinion, Port IO config access is here to stay, having been defined as an architected mechanism in the PCI 2.1 spec. This is most especially true for x86. In other words, for x86, I don't think we need to worry about Port IO config access ever going away at all. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Fwd: Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in]
Sorry, Meant to press reply/all. Forwarded Message From: Tony Camuso <[EMAIL PROTECTED]> To: Greg KH <[EMAIL PROTECTED]> Subject: Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in Date: Fri, 11 Jan 2008 20:58:52 -0500 Greg KH wrote: > Ivan, you posted one a while ago, but never seemed to get any > confirmation if it helped or not. Should I use that and drop Arjan's? > Or use both? Or something else like the patches proposed by Tony > Camuso? The 5-patch set I submitted is for Northbridges that don't respond to MMCONFIG cycles at all. We (RH & HP) were blacklisting boxes in RHEL, limiting them to legacy PCI, platform-wide. This was bad for systems that had both PCI legacy and PCI express buses, because it limited the functionality of the PCI express buses. The problem Matthew described is different, having to do with bus sizing code causing the large displacement of the graphics chip to overlap the decode for MMCONFIG space. Ivan suggested a fix for this problem that limits offsets below 64 bytes to lgecacy PCI config access. I tried this and it works perfectly! I submitted an informal patch for folks to try. Here it is again. diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 1bf5816..4474979 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -73,7 +73,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus, } base = get_base_addr(seg, bus, devfn); - if (!base) + if ((!base) || (reg < 0x40)) return pci_conf1_read(seg,bus,devfn,reg,len,value); spin_lock_irqsave(_config_lock, flags); @@ -106,7 +106,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return -EINVAL; base = get_base_addr(seg, bus, devfn); - if (!base) + if ((!base) || (reg < 0x40)) return pci_conf1_write(seg,bus,devfn,reg,len,value); spin_lock_irqsave(_config_lock, flags); diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index 4095e4d..4ad1fcb 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -61,7 +61,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus, } addr = pci_dev_base(seg, bus, devfn); - if (!addr) + if ((!addr) || (reg < 0x40)) return pci_conf1_read(seg,bus,devfn,reg,len,value); switch (len) { @@ -89,7 +89,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return -EINVAL; addr = pci_dev_base(seg, bus, devfn); - if (!addr) + if ((!addr) || (reg < 0x40)) return pci_conf1_write(seg,bus,devfn,reg,len,value); switch (len) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5][V2]PCI: x86 MMCONFIG: Preamble
Greg, Any thoughts? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5][V2]PCI: x86 MMCONFIG: Preamble
Greg, Any thoughts? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Fwd: Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in]
Sorry, Meant to press reply/all. Forwarded Message From: Tony Camuso [EMAIL PROTECTED] To: Greg KH [EMAIL PROTECTED] Subject: Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in Date: Fri, 11 Jan 2008 20:58:52 -0500 Greg KH wrote: Ivan, you posted one a while ago, but never seemed to get any confirmation if it helped or not. Should I use that and drop Arjan's? Or use both? Or something else like the patches proposed by Tony Camuso? The 5-patch set I submitted is for Northbridges that don't respond to MMCONFIG cycles at all. We (RH HP) were blacklisting boxes in RHEL, limiting them to legacy PCI, platform-wide. This was bad for systems that had both PCI legacy and PCI express buses, because it limited the functionality of the PCI express buses. The problem Matthew described is different, having to do with bus sizing code causing the large displacement of the graphics chip to overlap the decode for MMCONFIG space. Ivan suggested a fix for this problem that limits offsets below 64 bytes to lgecacy PCI config access. I tried this and it works perfectly! I submitted an informal patch for folks to try. Here it is again. diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 1bf5816..4474979 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -73,7 +73,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus, } base = get_base_addr(seg, bus, devfn); - if (!base) + if ((!base) || (reg 0x40)) return pci_conf1_read(seg,bus,devfn,reg,len,value); spin_lock_irqsave(pci_config_lock, flags); @@ -106,7 +106,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return -EINVAL; base = get_base_addr(seg, bus, devfn); - if (!base) + if ((!base) || (reg 0x40)) return pci_conf1_write(seg,bus,devfn,reg,len,value); spin_lock_irqsave(pci_config_lock, flags); diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index 4095e4d..4ad1fcb 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -61,7 +61,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus, } addr = pci_dev_base(seg, bus, devfn); - if (!addr) + if ((!addr) || (reg 0x40)) return pci_conf1_read(seg,bus,devfn,reg,len,value); switch (len) { @@ -89,7 +89,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return -EINVAL; addr = pci_dev_base(seg, bus, devfn); - if (!addr) + if ((!addr) || (reg 0x40)) return pci_conf1_write(seg,bus,devfn,reg,len,value); switch (len) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5][V2]PCI: x86 MMCONFIG: Preamble
ets within the pci config header (< 0x40) to legacy pci config mechanism. That would fix this problem without impacting devices that use control and satus register space above the header. I tried that, and it worked, but I believe that such a patch is beyond the scope of the problem this patch-set is intended to confront. Perhaps such a patch will be added after more discussion on lkml. Of course, the correct solution would be for the BIOS to assure that MMCONFIG space, and other such reserved MMIO areas, are well out of the reach of MMIO that can be claimed by any PCI device. Why this patch-set is needed MMCONFIG accesses are necessary to reach extended PCI config space of PCI Express (PCIe) devices. Systems that cannot do this are not PCIe compliant. Using "pci=nommconf" when only a subset of the buses on a given platform need to be constrained to Legacy PCI Config accesses, takes the whole platform out of compliance with the PCI Express spec. In most cases, only Legacy PCI buses need to be constrained to Legacy PCI Config accesses, so that the PCI Express buses in the platform could comply with the Express spec. This patch set provides a method whereby the Express buses can still employ MMCONFIG accesses while the Legacy buses are constrained to Legacy PCI Config accesess. This patch is not capable of detecting devices that throw machine checks when using MMCONFIG to access them. For example, he 830M/Mg graphics chipset throws a machine check exception when writing to its Base Address Register at offset 0x18 in its PCI config space. There may be, and probably are, other devices that misbehave in this manner. The solution for systems using such devices is to use "pci=nommconf" at the boot command as a workaround. This limits the whole system to Legacy PCI Config access, and puts PCIe extended configuration space out of reach, but at least the system can boot. Testing === This patch-set was tested on a variety of x86 platforms. Code was instrumented to trace execution to certify that the patch did what it was intended to do. The patch-set successfully detected non-compliant devices and was able to correctly assign Legacy PCI Config access to buses serving these devices while allowing other buses in the system to continue to use of every device discovered during the PCI probing sequence. MMCONFIG. The patch was also tested on non-x86 platforms to assure that there were no build problems or regressions. Signed-off-by: Tony Camuso <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5]PCI: x86 MMCONFIG
Thanks, Greg. Have a safe flight! On Tue, 2008-01-08 at 05:36 -0800, Greg KH wrote: > On Tue, Jan 08, 2008 at 08:14:22AM -0500, Tony Camuso wrote: > > I'll respin the whole kit with [PATCH ?/5][V2]PCI: x86 MMCONFIG * > > in the subject line, where the ? is replaced by the number of the > > patch within the set and the * is replaced by a brief description, > > if that's acceptable. > > That sounds great. > > > I can have it ready in a few hours. > > I'll be on a plane for a few hours, so don't feel the need to rush :) > > thanks, > > greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5]PCI: x86 MMCONFIG
I'll respin the whole kit with [PATCH ?/5][V2]PCI: x86 MMCONFIG * in the subject line, where the ? is replaced by the number of the patch within the set and the * is replaced by a brief description, if that's acceptable. I can have it ready in a few hours. On Mon, 2008-01-07 at 20:56 -0800, Greg KH wrote: > On Mon, Jan 07, 2008 at 10:20:23PM -0500, Tony Camuso wrote: > > Greg, > > > > Have you given this patch-set any more consideration? > > Which patch-set, there have been a number of them :) > > > I've submitted the changes you requested. > > Care to respin them all so I'm not confused? > > thanks, > > greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5]PCI: x86 MMCONFIG
I'll respin the whole kit with [PATCH ?/5][V2]PCI: x86 MMCONFIG * in the subject line, where the ? is replaced by the number of the patch within the set and the * is replaced by a brief description, if that's acceptable. I can have it ready in a few hours. On Mon, 2008-01-07 at 20:56 -0800, Greg KH wrote: On Mon, Jan 07, 2008 at 10:20:23PM -0500, Tony Camuso wrote: Greg, Have you given this patch-set any more consideration? Which patch-set, there have been a number of them :) I've submitted the changes you requested. Care to respin them all so I'm not confused? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5]PCI: x86 MMCONFIG
Thanks, Greg. Have a safe flight! On Tue, 2008-01-08 at 05:36 -0800, Greg KH wrote: On Tue, Jan 08, 2008 at 08:14:22AM -0500, Tony Camuso wrote: I'll respin the whole kit with [PATCH ?/5][V2]PCI: x86 MMCONFIG * in the subject line, where the ? is replaced by the number of the patch within the set and the * is replaced by a brief description, if that's acceptable. That sounds great. I can have it ready in a few hours. I'll be on a plane for a few hours, so don't feel the need to rush :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5][V2]PCI: x86 MMCONFIG: Preamble
the pci config header ( 0x40) to legacy pci config mechanism. That would fix this problem without impacting devices that use control and satus register space above the header. I tried that, and it worked, but I believe that such a patch is beyond the scope of the problem this patch-set is intended to confront. Perhaps such a patch will be added after more discussion on lkml. Of course, the correct solution would be for the BIOS to assure that MMCONFIG space, and other such reserved MMIO areas, are well out of the reach of MMIO that can be claimed by any PCI device. Why this patch-set is needed MMCONFIG accesses are necessary to reach extended PCI config space of PCI Express (PCIe) devices. Systems that cannot do this are not PCIe compliant. Using pci=nommconf when only a subset of the buses on a given platform need to be constrained to Legacy PCI Config accesses, takes the whole platform out of compliance with the PCI Express spec. In most cases, only Legacy PCI buses need to be constrained to Legacy PCI Config accesses, so that the PCI Express buses in the platform could comply with the Express spec. This patch set provides a method whereby the Express buses can still employ MMCONFIG accesses while the Legacy buses are constrained to Legacy PCI Config accesess. This patch is not capable of detecting devices that throw machine checks when using MMCONFIG to access them. For example, he 830M/Mg graphics chipset throws a machine check exception when writing to its Base Address Register at offset 0x18 in its PCI config space. There may be, and probably are, other devices that misbehave in this manner. The solution for systems using such devices is to use pci=nommconf at the boot command as a workaround. This limits the whole system to Legacy PCI Config access, and puts PCIe extended configuration space out of reach, but at least the system can boot. Testing === This patch-set was tested on a variety of x86 platforms. Code was instrumented to trace execution to certify that the patch did what it was intended to do. The patch-set successfully detected non-compliant devices and was able to correctly assign Legacy PCI Config access to buses serving these devices while allowing other buses in the system to continue to use of every device discovered during the PCI probing sequence. MMCONFIG. The patch was also tested on non-x86 platforms to assure that there were no build problems or regressions. Signed-off-by: Tony Camuso [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]
On Thu, 2007-12-20 at 22:50 +0300, Ivan Kokshaysky wrote: > Use type 1 just for the first 64 bytes and tg3 will be happy. All we need > is to avoid touching BARs with mmconfig. > > Ivan. I've tried Ivan's suggestion, and it works. The patch is appended below. My question is, do we want to incorporate this full-time upstream? It would fix a lot of existing and potential problems, and the cost is very small. Regards, Tony diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 1bf5816..42e7d4a 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -73,7 +73,8 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus, } base = get_base_addr(seg, bus, devfn); - if (!base) + if ((!addr) || (reg < 0x40)) + return pci_conf1_read(seg,bus,devfn,reg,len,value); spin_lock_irqsave(_config_lock, flags); @@ -106,7 +107,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return -EINVAL; base = get_base_addr(seg, bus, devfn); - if (!base) + if ((!addr) || (reg < 0x40)) return pci_conf1_write(seg,bus,devfn,reg,len,value); spin_lock_irqsave(_config_lock, flags); diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index 4095e4d..4ad1fcb 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -61,7 +61,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus, } addr = pci_dev_base(seg, bus, devfn); - if (!addr) + if ((!addr) || (reg < 0x40)) return pci_conf1_read(seg,bus,devfn,reg,len,value); switch (len) { @@ -89,7 +89,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return -EINVAL; addr = pci_dev_base(seg, bus, devfn); - if (!addr) + if ((!addr) || (reg < 0x40)) return pci_conf1_write(seg,bus,devfn,reg,len,value); switch (len) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5]PCI: x86 MMCONFIG
Greg, Have you given this patch-set any more consideration? I've submitted the changes you requested. Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]
On Thu, 2007-12-20 at 22:50 +0300, Ivan Kokshaysky wrote: Use type 1 just for the first 64 bytes and tg3 will be happy. All we need is to avoid touching BARs with mmconfig. Ivan. I've tried Ivan's suggestion, and it works. The patch is appended below. My question is, do we want to incorporate this full-time upstream? It would fix a lot of existing and potential problems, and the cost is very small. Regards, Tony diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 1bf5816..42e7d4a 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -73,7 +73,8 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus, } base = get_base_addr(seg, bus, devfn); - if (!base) + if ((!addr) || (reg 0x40)) + return pci_conf1_read(seg,bus,devfn,reg,len,value); spin_lock_irqsave(pci_config_lock, flags); @@ -106,7 +107,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return -EINVAL; base = get_base_addr(seg, bus, devfn); - if (!base) + if ((!addr) || (reg 0x40)) return pci_conf1_write(seg,bus,devfn,reg,len,value); spin_lock_irqsave(pci_config_lock, flags); diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index 4095e4d..4ad1fcb 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -61,7 +61,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus, } addr = pci_dev_base(seg, bus, devfn); - if (!addr) + if ((!addr) || (reg 0x40)) return pci_conf1_read(seg,bus,devfn,reg,len,value); switch (len) { @@ -89,7 +89,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return -EINVAL; addr = pci_dev_base(seg, bus, devfn); - if (!addr) + if ((!addr) || (reg 0x40)) return pci_conf1_write(seg,bus,devfn,reg,len,value); switch (len) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fwd: Re: [PATCH 4/5]PCI: x86 MMCONFIG] introduce pcibios_fix_bus_scan_quirk()
I need to resubmit this patch. Resubmission will be in my next email. It was a minor problem whereby the patch would advise the user If a device isn't working, try "pci=nommconf". even when "pci=mmconf" was used at the command line. One additional line of logic fixed it, but I would like to resubmit the patch rather than patch the patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fwd: Re: [PATCH 4/5]PCI: x86 MMCONFIG] introduce pcibios_fix_bus_scan_quirk()
I need to resubmit this patch. Resubmission will be in my next email. It was a minor problem whereby the patch would advise the user If a device isn't working, try pci=nommconf. even when pci=mmconf was used at the command line. One additional line of logic fixed it, but I would like to resubmit the patch rather than patch the patch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5]PCI: x86 MMCONFIG - a couple of corrections to the preamble
Some corrections to the preamble. The large amount of text is due to the nature of the problem and the discussion it engendered on the lkml. Should say: The large amount of text in the explanation below is due to the nature of the problem and the discussion engendered on lkml by my first submission. This patch-set detects the problem by comparing an MMCONFIG read to a Legacy PCI config read of the vendor/device dword of every device discovered during the PCI probing sequence. Actually, the patch doesn't do a pci config read of EVERY device discovered on every bus. Only on buses above a number defined by PCI_MMCFG_MAX_CHECK_BUS and whose parent bus's pci_bus.ops field points to pci_root_ops, which are mmconfig access routines in systems in which mmconfig is the default mechanism. In such cases where the parent bus pci_bus.ops field points to pci_legacy_ops, the current bus does not need to be probed, since its own pci_bus.ops field will inherit the pointer to pci_legacy_ops from its parent. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5]PCI: x86 MMCONFIG - a couple of corrections to the preamble
Some corrections to the preamble. The large amount of text is due to the nature of the problem and the discussion it engendered on the lkml. Should say: The large amount of text in the explanation below is due to the nature of the problem and the discussion engendered on lkml by my first submission. This patch-set detects the problem by comparing an MMCONFIG read to a Legacy PCI config read of the vendor/device dword of every device discovered during the PCI probing sequence. Actually, the patch doesn't do a pci config read of EVERY device discovered on every bus. Only on buses above a number defined by PCI_MMCFG_MAX_CHECK_BUS and whose parent bus's pci_bus.ops field points to pci_root_ops, which are mmconfig access routines in systems in which mmconfig is the default mechanism. In such cases where the parent bus pci_bus.ops field points to pci_legacy_ops, the current bus does not need to be probed, since its own pci_bus.ops field will inherit the pointer to pci_legacy_ops from its parent. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/