Re: [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
Hello Ingo, > So I've rewritten your changelog accordingly - see the attached patch. > > I have also added a Cc: stable tag. Thanks a lot! - Masayoshi Tue, 13 Feb 2018 12:48:41 +0100 Ingo Molnar wrote: > > * Masayoshi Mizuma <msys.miz...@gmail.com> wrote: > >> From: Masayoshi Mizuma <m.miz...@jp.fujitsu.com> >> >> When a physical cpu is hot-removed, the following warning message >> are shown while the uncore device is removing in uncore_pci_remove(). >> >> WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 >> uncore_pci_remove+0xf1/0x110 >> ... >> CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 >> Workqueue: kacpi_hotplug acpi_hotplug_work_fn >> ... >> Call Trace: >> pci_device_remove+0x36/0xb0 >> device_release_driver_internal+0x145/0x210 >> pci_stop_bus_device+0x76/0xa0 >> pci_stop_root_bus+0x44/0x60 >> acpi_pci_root_remove+0x1f/0x80 >> acpi_bus_trim+0x54/0x90 >> acpi_bus_trim+0x2e/0x90 >> acpi_device_hotplug+0x2bc/0x4b0 >> acpi_hotplug_work_fn+0x1a/0x30 >> process_one_work+0x141/0x340 >> worker_thread+0x47/0x3e0 >> kthread+0xf5/0x130 >> >> When uncore_pci_remove() runs, it tries to get package id to >> clear the value of uncore_extra_pci_dev[].dev[] by using >> topology_phys_to_logical_pkg(). The warning messesage are >> shown because topology_phys_to_logical_pkg() returns -1. >> >> arch/x86/events/intel/uncore.c: >> static void uncore_pci_remove(struct pci_dev *pdev) >> { >> ... >> phys_id = uncore_pcibus_to_physid(pdev->bus); >> ... >> pkg = topology_phys_to_logical_pkg(phys_id); //returns -1 >> for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { >> if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { >> uncore_extra_pci_dev[pkg].dev[i] = NULL; >> break; >> } >> } >> WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); //HERE!! >> >> topology_phys_to_logical_pkg() tries to find >> cpuinfo_x86->phys_proc_id that matches the phys_pkg argument. >> >> arch/x86/kernel/smpboot.c: >> int topology_phys_to_logical_pkg(unsigned int phys_pkg) >> { >> int cpu; >> >> for_each_possible_cpu(cpu) { >> struct cpuinfo_x86 *c = _data(cpu); >> >> if (c->initialized && c->phys_proc_id == phys_pkg) >> return c->logical_proc_id; >> } >> return -1; >> } >> >> However, the phys_proc_id is already set to 0 by remove_siblinginfo() >> when the cpu was offlined. >> So, topology_phys_to_logical_pkg() cannot find correct the >> logical_proc_id and always returns -1. >> As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning >> messages are shown. >> >> To avoid this, remove the setting from remove_siblinginfo(). >> There is no influence about the removing because phys_proc_id is not >> used after it is hot-removed and it is re-set while hot-adding. > > So I think this fix goes beyond fixing a 'warning', if we get -1 for 'pkg': > >> pkg = topology_phys_to_logical_pkg(phys_id); //returns -1 >> for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { >> if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { >> uncore_extra_pci_dev[pkg].dev[i] = NULL; > > ... then that creates two _real_ bugs AFAICS: > > 1) we dereference uncore_extra_pci_dev[] with a negative index > > 2) we fail to clean up a stale pointer in uncore_extra_pci_dev[][] > > So I've rewritten your changelog accordingly - see the attached patch. > > I have also added a Cc: stable tag > > Thanks, > > Ingo > > ===> > From 295cc7eb314eb3321fb6d67ca6f7305f5c50d10f Mon Sep 17 00:00:00 2001 > From: Masayoshi Mizuma <m.miz...@jp.fujitsu.com> > Date: Thu, 8 Feb 2018 09:19:08 -0500 > Subject: [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when > hot-removing a physical CPU > > When a physical CPU is hot-removed, the following warning messages > are shown while the uncore device is removed in uncore_pci_remove(): > > WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 > uncore_pci_remove+0xf1/0x110 > ... > CPU: 120 PID: 5 Comm: kworker/u1024
Re: [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
Hello Ingo, > So I've rewritten your changelog accordingly - see the attached patch. > > I have also added a Cc: stable tag. Thanks a lot! - Masayoshi Tue, 13 Feb 2018 12:48:41 +0100 Ingo Molnar wrote: > > * Masayoshi Mizuma wrote: > >> From: Masayoshi Mizuma >> >> When a physical cpu is hot-removed, the following warning message >> are shown while the uncore device is removing in uncore_pci_remove(). >> >> WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 >> uncore_pci_remove+0xf1/0x110 >> ... >> CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 >> Workqueue: kacpi_hotplug acpi_hotplug_work_fn >> ... >> Call Trace: >> pci_device_remove+0x36/0xb0 >> device_release_driver_internal+0x145/0x210 >> pci_stop_bus_device+0x76/0xa0 >> pci_stop_root_bus+0x44/0x60 >> acpi_pci_root_remove+0x1f/0x80 >> acpi_bus_trim+0x54/0x90 >> acpi_bus_trim+0x2e/0x90 >> acpi_device_hotplug+0x2bc/0x4b0 >> acpi_hotplug_work_fn+0x1a/0x30 >> process_one_work+0x141/0x340 >> worker_thread+0x47/0x3e0 >> kthread+0xf5/0x130 >> >> When uncore_pci_remove() runs, it tries to get package id to >> clear the value of uncore_extra_pci_dev[].dev[] by using >> topology_phys_to_logical_pkg(). The warning messesage are >> shown because topology_phys_to_logical_pkg() returns -1. >> >> arch/x86/events/intel/uncore.c: >> static void uncore_pci_remove(struct pci_dev *pdev) >> { >> ... >> phys_id = uncore_pcibus_to_physid(pdev->bus); >> ... >> pkg = topology_phys_to_logical_pkg(phys_id); //returns -1 >> for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { >> if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { >> uncore_extra_pci_dev[pkg].dev[i] = NULL; >> break; >> } >> } >> WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); //HERE!! >> >> topology_phys_to_logical_pkg() tries to find >> cpuinfo_x86->phys_proc_id that matches the phys_pkg argument. >> >> arch/x86/kernel/smpboot.c: >> int topology_phys_to_logical_pkg(unsigned int phys_pkg) >> { >> int cpu; >> >> for_each_possible_cpu(cpu) { >> struct cpuinfo_x86 *c = _data(cpu); >> >> if (c->initialized && c->phys_proc_id == phys_pkg) >> return c->logical_proc_id; >> } >> return -1; >> } >> >> However, the phys_proc_id is already set to 0 by remove_siblinginfo() >> when the cpu was offlined. >> So, topology_phys_to_logical_pkg() cannot find correct the >> logical_proc_id and always returns -1. >> As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning >> messages are shown. >> >> To avoid this, remove the setting from remove_siblinginfo(). >> There is no influence about the removing because phys_proc_id is not >> used after it is hot-removed and it is re-set while hot-adding. > > So I think this fix goes beyond fixing a 'warning', if we get -1 for 'pkg': > >> pkg = topology_phys_to_logical_pkg(phys_id); //returns -1 >> for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { >> if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { >> uncore_extra_pci_dev[pkg].dev[i] = NULL; > > ... then that creates two _real_ bugs AFAICS: > > 1) we dereference uncore_extra_pci_dev[] with a negative index > > 2) we fail to clean up a stale pointer in uncore_extra_pci_dev[][] > > So I've rewritten your changelog accordingly - see the attached patch. > > I have also added a Cc: stable tag > > Thanks, > > Ingo > > ===> > From 295cc7eb314eb3321fb6d67ca6f7305f5c50d10f Mon Sep 17 00:00:00 2001 > From: Masayoshi Mizuma > Date: Thu, 8 Feb 2018 09:19:08 -0500 > Subject: [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when > hot-removing a physical CPU > > When a physical CPU is hot-removed, the following warning messages > are shown while the uncore device is removed in uncore_pci_remove(): > > WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 > uncore_pci_remove+0xf1/0x110 > ... > CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 > Workqueue: kacpi_hotplug acpi_hotplug_work_fn > ... &g
[PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
* Masayoshi Mizuma <msys.miz...@gmail.com> wrote: > From: Masayoshi Mizuma <m.miz...@jp.fujitsu.com> > > When a physical cpu is hot-removed, the following warning message > are shown while the uncore device is removing in uncore_pci_remove(). > > WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 > uncore_pci_remove+0xf1/0x110 > ... > CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 > Workqueue: kacpi_hotplug acpi_hotplug_work_fn > ... > Call Trace: > pci_device_remove+0x36/0xb0 > device_release_driver_internal+0x145/0x210 > pci_stop_bus_device+0x76/0xa0 > pci_stop_root_bus+0x44/0x60 > acpi_pci_root_remove+0x1f/0x80 > acpi_bus_trim+0x54/0x90 > acpi_bus_trim+0x2e/0x90 > acpi_device_hotplug+0x2bc/0x4b0 > acpi_hotplug_work_fn+0x1a/0x30 > process_one_work+0x141/0x340 > worker_thread+0x47/0x3e0 > kthread+0xf5/0x130 > > When uncore_pci_remove() runs, it tries to get package id to > clear the value of uncore_extra_pci_dev[].dev[] by using > topology_phys_to_logical_pkg(). The warning messesage are > shown because topology_phys_to_logical_pkg() returns -1. > > arch/x86/events/intel/uncore.c: > static void uncore_pci_remove(struct pci_dev *pdev) > { > ... > phys_id = uncore_pcibus_to_physid(pdev->bus); > ... > pkg = topology_phys_to_logical_pkg(phys_id); //returns -1 > for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { > if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { > uncore_extra_pci_dev[pkg].dev[i] = NULL; > break; > } > } > WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); //HERE!! > > topology_phys_to_logical_pkg() tries to find > cpuinfo_x86->phys_proc_id that matches the phys_pkg argument. > > arch/x86/kernel/smpboot.c: > int topology_phys_to_logical_pkg(unsigned int phys_pkg) > { > int cpu; > > for_each_possible_cpu(cpu) { > struct cpuinfo_x86 *c = _data(cpu); > > if (c->initialized && c->phys_proc_id == phys_pkg) > return c->logical_proc_id; > } > return -1; > } > > However, the phys_proc_id is already set to 0 by remove_siblinginfo() > when the cpu was offlined. > So, topology_phys_to_logical_pkg() cannot find correct the > logical_proc_id and always returns -1. > As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning > messages are shown. > > To avoid this, remove the setting from remove_siblinginfo(). > There is no influence about the removing because phys_proc_id is not > used after it is hot-removed and it is re-set while hot-adding. So I think this fix goes beyond fixing a 'warning', if we get -1 for 'pkg': > pkg = topology_phys_to_logical_pkg(phys_id); //returns -1 > for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { > if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { > uncore_extra_pci_dev[pkg].dev[i] = NULL; ... then that creates two _real_ bugs AFAICS: 1) we dereference uncore_extra_pci_dev[] with a negative index 2) we fail to clean up a stale pointer in uncore_extra_pci_dev[][] So I've rewritten your changelog accordingly - see the attached patch. I have also added a Cc: stable tag. Thanks, Ingo =======> >From 295cc7eb314eb3321fb6d67ca6f7305f5c50d10f Mon Sep 17 00:00:00 2001 From: Masayoshi Mizuma <m.miz...@jp.fujitsu.com> Date: Thu, 8 Feb 2018 09:19:08 -0500 Subject: [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU When a physical CPU is hot-removed, the following warning messages are shown while the uncore device is removed in uncore_pci_remove(): WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 uncore_pci_remove+0xf1/0x110 ... CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 Workqueue: kacpi_hotplug acpi_hotplug_work_fn ... Call Trace: pci_device_remove+0x36/0xb0 device_release_driver_internal+0x145/0x210 pci_stop_bus_device+0x76/0xa0 pci_stop_root_bus+0x44/0x60 acpi_pci_root_remove+0x1f/0x80 acpi_bus_trim+0x54/0x90 acpi_bus_trim+0x2e/0x90 acpi_device_hotplug+0x2bc/0x4b0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x141/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 When uncore_pci_remove() runs, it tries to get the package ID to clear the value of uncore_extra_pci_dev[].dev[] by using topology_phys_to_logical_pkg(). The warning messesages are shown because topology_phys_to_logical_pkg
[PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
* Masayoshi Mizuma wrote: > From: Masayoshi Mizuma > > When a physical cpu is hot-removed, the following warning message > are shown while the uncore device is removing in uncore_pci_remove(). > > WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 > uncore_pci_remove+0xf1/0x110 > ... > CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 > Workqueue: kacpi_hotplug acpi_hotplug_work_fn > ... > Call Trace: > pci_device_remove+0x36/0xb0 > device_release_driver_internal+0x145/0x210 > pci_stop_bus_device+0x76/0xa0 > pci_stop_root_bus+0x44/0x60 > acpi_pci_root_remove+0x1f/0x80 > acpi_bus_trim+0x54/0x90 > acpi_bus_trim+0x2e/0x90 > acpi_device_hotplug+0x2bc/0x4b0 > acpi_hotplug_work_fn+0x1a/0x30 > process_one_work+0x141/0x340 > worker_thread+0x47/0x3e0 > kthread+0xf5/0x130 > > When uncore_pci_remove() runs, it tries to get package id to > clear the value of uncore_extra_pci_dev[].dev[] by using > topology_phys_to_logical_pkg(). The warning messesage are > shown because topology_phys_to_logical_pkg() returns -1. > > arch/x86/events/intel/uncore.c: > static void uncore_pci_remove(struct pci_dev *pdev) > { > ... > phys_id = uncore_pcibus_to_physid(pdev->bus); > ... > pkg = topology_phys_to_logical_pkg(phys_id); //returns -1 > for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { > if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { > uncore_extra_pci_dev[pkg].dev[i] = NULL; > break; > } > } > WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); //HERE!! > > topology_phys_to_logical_pkg() tries to find > cpuinfo_x86->phys_proc_id that matches the phys_pkg argument. > > arch/x86/kernel/smpboot.c: > int topology_phys_to_logical_pkg(unsigned int phys_pkg) > { > int cpu; > > for_each_possible_cpu(cpu) { > struct cpuinfo_x86 *c = _data(cpu); > > if (c->initialized && c->phys_proc_id == phys_pkg) > return c->logical_proc_id; > } > return -1; > } > > However, the phys_proc_id is already set to 0 by remove_siblinginfo() > when the cpu was offlined. > So, topology_phys_to_logical_pkg() cannot find correct the > logical_proc_id and always returns -1. > As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning > messages are shown. > > To avoid this, remove the setting from remove_siblinginfo(). > There is no influence about the removing because phys_proc_id is not > used after it is hot-removed and it is re-set while hot-adding. So I think this fix goes beyond fixing a 'warning', if we get -1 for 'pkg': > pkg = topology_phys_to_logical_pkg(phys_id); //returns -1 > for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { > if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { > uncore_extra_pci_dev[pkg].dev[i] = NULL; ... then that creates two _real_ bugs AFAICS: 1) we dereference uncore_extra_pci_dev[] with a negative index 2) we fail to clean up a stale pointer in uncore_extra_pci_dev[][] So I've rewritten your changelog accordingly - see the attached patch. I have also added a Cc: stable tag. Thanks, Ingo ===> >From 295cc7eb314eb3321fb6d67ca6f7305f5c50d10f Mon Sep 17 00:00:00 2001 From: Masayoshi Mizuma Date: Thu, 8 Feb 2018 09:19:08 -0500 Subject: [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU When a physical CPU is hot-removed, the following warning messages are shown while the uncore device is removed in uncore_pci_remove(): WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 uncore_pci_remove+0xf1/0x110 ... CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 Workqueue: kacpi_hotplug acpi_hotplug_work_fn ... Call Trace: pci_device_remove+0x36/0xb0 device_release_driver_internal+0x145/0x210 pci_stop_bus_device+0x76/0xa0 pci_stop_root_bus+0x44/0x60 acpi_pci_root_remove+0x1f/0x80 acpi_bus_trim+0x54/0x90 acpi_bus_trim+0x2e/0x90 acpi_device_hotplug+0x2bc/0x4b0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x141/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 When uncore_pci_remove() runs, it tries to get the package ID to clear the value of uncore_extra_pci_dev[].dev[] by using topology_phys_to_logical_pkg(). The warning messesages are shown because topology_phys_to_logical_pkg() returns -1. arch/x86/events/intel/uncore.c: static void uncore_pci_remove(struct pci_dev *pdev) {