[Kernel-packages] [Bug 1747069] Re: artful 4.13 i386 kernels crash after memory hotplug remove
This bug was fixed in the package linux - 4.13.0-36.40 --- linux (4.13.0-36.40) artful; urgency=medium * linux: 4.13.0-36.40 -proposed tracker (LP: #1750010) * Rebuild without "CVE-2017-5754 ARM64 KPTI fixes" patch set linux (4.13.0-35.39) artful; urgency=medium * linux: 4.13.0-35.39 -proposed tracker (LP: #1748743) * CVE-2017-5715 (Spectre v2 Intel) - Revert "UBUNTU: SAUCE: turn off IBPB when full retpoline is present" - SAUCE: turn off IBRS when full retpoline is present - [Packaging] retpoline files must be sorted - [Packaging] pull in retpoline files linux (4.13.0-34.37) artful; urgency=medium * linux: 4.13.0-34.37 -proposed tracker (LP: #1748475) * libata: apply MAX_SEC_1024 to all LITEON EP1 series devices (LP: #1743053) - libata: apply MAX_SEC_1024 to all LITEON EP1 series devices * KVM patches for s390x to provide facility bits 81 (ppa15) and 82 (bpb) (LP: #1747090) - KVM: s390: wire up bpb feature * artful 4.13 i386 kernels crash after memory hotplug remove (LP: #1747069) - Revert "mm, memory_hotplug: do not associate hotadded memory to zones until online" * CVE-2017-5715 (Spectre v2 Intel) - x86/feature: Enable the x86 feature to control Speculation - x86/feature: Report presence of IBPB and IBRS control - x86/enter: MACROS to set/clear IBRS and set IBPB - x86/enter: Use IBRS on syscall and interrupts - x86/idle: Disable IBRS entering idle and enable it on wakeup - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup - x86/mm: Set IBPB upon context switch - x86/mm: Only set IBPB when the new thread cannot ptrace current thread - x86/entry: Stuff RSB for entry to kernel for non-SMEP platform - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm - x86/kvm: Set IBPB when switching VM - x86/kvm: Toggle IBRS on VM entry and exit - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature - x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control - x86/cpu/AMD: Add speculative control support for AMD - x86/microcode: Extend post microcode reload to support IBPB feature - KVM: SVM: Do not intercept new speculative control MSRs - x86/svm: Set IBRS value on VM entry and exit - x86/svm: Set IBPB when running a different VCPU - KVM: x86: Add speculative control CPUID support for guests - SAUCE: turn off IBPB when full retpoline is present * Artful 4.13 fixes for tun (LP: #1748846) - tun: call dev_get_valid_name() before register_netdevice() - tun: allow positive return values on dev_get_valid_name() call - tun/tap: sanitize TUNSETSNDBUF input * boot failure on AMD Raven + WestonXT (LP: #1742759) - SAUCE: drm/amdgpu: add atpx quirk handling (v2) linux (4.13.0-33.36) artful; urgency=low * linux: 4.13.0-33.36 -proposed tracker (LP: #1746903) [ Stefan Bader ] * starting VMs causing retpoline4 to reboot (LP: #1747507) // CVE-2017-5715 (Spectre v2 retpoline) - x86/retpoline: Fill RSB on context switch for affected CPUs - x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros - x86/retpoline: Optimize inline assembler for vmexit_fill_RSB - x86/retpoline: Remove the esp/rsp thunk - x86/retpoline: Simplify vmexit_fill_RSB() * Missing install-time driver for QLogic QED 25/40/100Gb Ethernet NIC (LP: #1743638) - [d-i] Add qede to nic-modules udeb * hisi_sas: driver robustness fixes (LP: #1739807) - scsi: hisi_sas: fix reset and port ID refresh issues - scsi: hisi_sas: avoid potential v2 hw interrupt issue - scsi: hisi_sas: fix v2 hw underflow residual value - scsi: hisi_sas: add v2 hw DFX feature - scsi: hisi_sas: add irq and tasklet cleanup in v2 hw - scsi: hisi_sas: service interrupt ITCT_CLR interrupt in v2 hw - scsi: hisi_sas: fix internal abort slot timeout bug - scsi: hisi_sas: us start_phy in PHY_FUNC_LINK_RESET - scsi: hisi_sas: fix NULL check in SMP abort task path - scsi: hisi_sas: fix the risk of freeing slot twice - scsi: hisi_sas: kill tasklet when destroying irq in v3 hw - scsi: hisi_sas: complete all tasklets prior to host reset * [Artful/Zesty] ACPI APEI error handling bug fixes (LP: #1732990) - ACPI: APEI: fix the wrong iteration of generic error status block - ACPI / APEI: clear error status before acknowledging the error * [Zesty/Artful] On ARM64 PCIE physical function passthrough guest fails to boot (LP: #1732804) - vfio/pci: Virtualize Maximum Payload Size - vfio/pci: Virtualize Maximum Read Request Size * hisi_sas: Add ATA command support for SMR disks (LP: #1739891) - scsi: hisi_sas: support zone management commands * thunderx2: i2c driver PEC and ACPI clock fixes (LP: #1738073) - ACPI / APD: Add clock frequency for ThunderX2 I2C controller - i2c: xlp9xx: Get clock frequency with clk API - i2c: xlp9xx: Handle
[Kernel-packages] [Bug 1747069] Re: artful 4.13 i386 kernels crash after memory hotplug remove
Ran the tests against the i386 -proposed kernel, cannot reproduce the issue with the fixed kernel. Also ran the ADT tests and could not reproduce the issue (and these run the memory hotplug tests too). Tested, and verified. ** Tags removed: verification-needed-artful ** Tags added: verification-done-artful -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1747069 Title: artful 4.13 i386 kernels crash after memory hotplug remove Status in linux package in Ubuntu: In Progress Bug description: == SRU Request, Artful == Hotplug removal causes i386 crashes when exercised with the kernel selftest mem-on-off-test script. == Fix == Revert commit f1dd2cd13c4b (""mm, memory_hotplug: do not associate hotadded memory to zones until online") Note: A fix occurs in 4.15 however this requires a large set of changes that are way too large to be SRU'able and the least risky way forward is to revert the offending commit. == Testcase == Running the kernel selftest script mem-on-off-test.sh, followed by a sync, followed by re-installing kernel packages will always trigger this issue. Simply running the mem-on-off-test.sh script sometimes won't trigger the problem. I believe this is why we've not seen this happen too frequently with our ADT tests. I can reproduce this in a VM with 4 CPUs and 2GB of memory. == Regression Potential == Reverting this commit does remove some functionality, however this does not regress the kernel compared to previous releases and having a working reliable memory hotplug is the preferred option. This fix does touch some memory hotplug, so there is a risk that this may break this functionality that is not covered by the kernel regression testing. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747069/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1747069] Re: artful 4.13 i386 kernels crash after memory hotplug remove
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed- artful'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-artful -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1747069 Title: artful 4.13 i386 kernels crash after memory hotplug remove Status in linux package in Ubuntu: In Progress Bug description: == SRU Request, Artful == Hotplug removal causes i386 crashes when exercised with the kernel selftest mem-on-off-test script. == Fix == Revert commit f1dd2cd13c4b (""mm, memory_hotplug: do not associate hotadded memory to zones until online") Note: A fix occurs in 4.15 however this requires a large set of changes that are way too large to be SRU'able and the least risky way forward is to revert the offending commit. == Testcase == Running the kernel selftest script mem-on-off-test.sh, followed by a sync, followed by re-installing kernel packages will always trigger this issue. Simply running the mem-on-off-test.sh script sometimes won't trigger the problem. I believe this is why we've not seen this happen too frequently with our ADT tests. I can reproduce this in a VM with 4 CPUs and 2GB of memory. == Regression Potential == Reverting this commit does remove some functionality, however this does not regress the kernel compared to previous releases and having a working reliable memory hotplug is the preferred option. This fix does touch some memory hotplug, so there is a risk that this may break this functionality that is not covered by the kernel regression testing. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747069/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1747069] Re: artful 4.13 i386 kernels crash after memory hotplug remove
** Description changed: - It seems all of the artful 4.13 are crashing on i386 install when memory - hotplug removal is attempted. This crash occurs a few seconds after the - removal. I have done a gross bisect back to the first 4.13.0-11 which - is also affected. + == SRU Request, Artful == + + Hotplug removal causes i386 crashes when exercised with the kernel + selftest mem-on-off-test script. + + == Fix == + + Revert commit f1dd2cd13c4b (""mm, memory_hotplug: do not associate + hotadded memory to zones until online") + + Note: A fix occurs in 4.15 however this requires a large set of changes + that are way too large to be SRU'able and the least risky way forward is + to revert the offending commit. + + == Testcase == + + Running the kernel selftest script mem-on-off-test.sh, followed by a + sync, followed by re-installing kernel packages will always trigger this + issue. Simply running the mem-on-off-test.sh script sometimes won't + trigger the problem. I believe this is why we've not seen this happen + too frequently with our ADT tests. I can reproduce this in a VM with 4 + CPUs and 2GB of memory. + + == Regression Potential == + + Reverting this commit does remove some functionality, however this does + not regress the kernel compared to previous releases and having a + working reliable memory hotplug is the preferred option. This fix does + touch some memory hotplug, so there is a risk that this may break this + functionality that is not covered by the kernel regression testing. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1747069 Title: artful 4.13 i386 kernels crash after memory hotplug remove Status in linux package in Ubuntu: In Progress Bug description: == SRU Request, Artful == Hotplug removal causes i386 crashes when exercised with the kernel selftest mem-on-off-test script. == Fix == Revert commit f1dd2cd13c4b (""mm, memory_hotplug: do not associate hotadded memory to zones until online") Note: A fix occurs in 4.15 however this requires a large set of changes that are way too large to be SRU'able and the least risky way forward is to revert the offending commit. == Testcase == Running the kernel selftest script mem-on-off-test.sh, followed by a sync, followed by re-installing kernel packages will always trigger this issue. Simply running the mem-on-off-test.sh script sometimes won't trigger the problem. I believe this is why we've not seen this happen too frequently with our ADT tests. I can reproduce this in a VM with 4 CPUs and 2GB of memory. == Regression Potential == Reverting this commit does remove some functionality, however this does not regress the kernel compared to previous releases and having a working reliable memory hotplug is the preferred option. This fix does touch some memory hotplug, so there is a risk that this may break this functionality that is not covered by the kernel regression testing. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747069/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1747069] Re: artful 4.13 i386 kernels crash after memory hotplug remove
Ignore my above comments. I bisected this again using a more reliable reproducer and found the first bad commit to be: f1dd2cd13c4bbbc9a7c4617b3b034fa643de98fe is the first bad commit commit f1dd2cd13c4bbbc9a7c4617b3b034fa643de98fe Author: Michal HockoDate: Thu Jul 6 15:38:11 2017 -0700 mm, memory_hotplug: do not associate hotadded memory to zones until online The current memory hotplug implementation relies on having all the struct pages associate with a zone/node during the physical hotplug phase (arch_add_memory->__add_pages->__add_section->__add_zone). In the vast majority of cases this means that they are added to ZONE_NORMAL. This has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd without sparsemem") and it wasn't a big deal back then because movable onlining didn't exist yet. Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable memory and portion memory") and then things got more complicated. Rather than reconsidering the zone association which was no longer needed (because the memory hotplug already depended on SPARSEMEM) a convoluted semantic of zone shifting has been developed. Only the currently last memblock or the one adjacent to the zone_movable can be onlined movable. This essentially means that the online type changes as the new memblocks are added. Let's simulate memory hot online manually $ echo 0x1 > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory32/valid_zones Normal Movable $ echo $((0x1+(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable $ echo $((0x1+2*(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal /sys/devices/system/memory/memory34/valid_zones:Normal Movable $ echo online_movable > /sys/devices/system/memory/memory34/state $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Movable Normal This is an awkward semantic because an udev event is sent as soon as the block is onlined and an udev handler might want to online it based on some policy (e.g. association with a node) but it will inherently race with new blocks showing up. This patch changes the physical online phase to not associate pages with any zone at all. All the pages are just marked reserved and wait for the onlining phase to be associated with the zone as per the online request. There are only two requirements - existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap - ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses the latter one is not an inherent requirement and can be changed in the future. It preserves the current behavior and made the code slightly simpler. This is subject to change in future. This means that the same physical online steps as above will lead to the following state: Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Movable Implementation: The current move_pfn_range is reimplemented to check the above requirements (allow_online_pfn_range) and then updates the respective zone (move_pfn_range_to_zone), the pgdat and links all the pages in the pfn range with the zone/node. __add_pages is updated to not require the zone and only initializes sections in the range. This allowed to simplify the arch_add_memory code (s390 could get rid of quite some of code). devm_memremap_pages is the only user of arch_add_memory which relies on the zone association because it only hooks into the memory hotplug only half way. It uses it to associate the new memory with ZONE_DEVICE but doesn't allow it to be {on,off}lined via sysfs. This means that this
[Kernel-packages] [Bug 1747069] Re: artful 4.13 i386 kernels crash after memory hotplug remove
Requires backports of commits: a86d69d58aad561b6bbb44e60f74c41cd4b5f3ab ed067d4a859ff696373324c5061392e013a7561a f7f99100d8d95dbcf09e0216a143211e79418b9f a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b ea1f5f3712afe895dfa4176ec87376b4a9ac23be -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1747069 Title: artful 4.13 i386 kernels crash after memory hotplug remove Status in linux package in Ubuntu: In Progress Bug description: It seems all of the artful 4.13 are crashing on i386 install when memory hotplug removal is attempted. This crash occurs a few seconds after the removal. I have done a gross bisect back to the first 4.13.0-11 which is also affected. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747069/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1747069] Re: artful 4.13 i386 kernels crash after memory hotplug remove
Fixed in 4.14 with: f7f99100d8d95dbcf09e0216a143211e79418b9f is the first bad commit commit f7f99100d8d95dbcf09e0216a143211e79418b9f Author: Pavel TatashinDate: Wed Nov 15 17:36:44 2017 -0800 mm: stop zeroing memory during allocation in vmemmap -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1747069 Title: artful 4.13 i386 kernels crash after memory hotplug remove Status in linux package in Ubuntu: In Progress Bug description: It seems all of the artful 4.13 are crashing on i386 install when memory hotplug removal is attempted. This crash occurs a few seconds after the removal. I have done a gross bisect back to the first 4.13.0-11 which is also affected. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747069/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1747069] Re: artful 4.13 i386 kernels crash after memory hotplug remove
Bisected, bad commit: b3c6858fb172512f63838523ae7817ae8adec564 - this is a merge and contains a lot of misc changes across the tree that may have broken this. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1747069 Title: artful 4.13 i386 kernels crash after memory hotplug remove Status in linux package in Ubuntu: In Progress Bug description: It seems all of the artful 4.13 are crashing on i386 install when memory hotplug removal is attempted. This crash occurs a few seconds after the removal. I have done a gross bisect back to the first 4.13.0-11 which is also affected. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747069/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1747069] Re: artful 4.13 i386 kernels crash after memory hotplug remove
** Changed in: linux (Ubuntu) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1747069 Title: artful 4.13 i386 kernels crash after memory hotplug remove Status in linux package in Ubuntu: In Progress Bug description: It seems all of the artful 4.13 are crashing on i386 install when memory hotplug removal is attempted. This crash occurs a few seconds after the removal. I have done a gross bisect back to the first 4.13.0-11 which is also affected. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747069/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp