[Kernel-packages] [Bug 2042568] Re: Azure - Kernel crashes when removing gpu from pci

2024-02-28 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-azure-
fips/5.15.0-1058.66+fips1 kernel in -proposed solves the problem. Please
test the kernel and update this bug with the results. If the problem is
solved, change the tag 'verification-needed-jammy-linux-azure-fips' to
'verification-done-jammy-linux-azure-fips'. If the problem still exists,
change the tag 'verification-needed-jammy-linux-azure-fips' to
'verification-failed-jammy-linux-azure-fips'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-azure-fips-v2 
verification-needed-jammy-linux-azure-fips

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2042568

Title:
  Azure - Kernel crashes when removing gpu from pci

Status in linux-azure package in Ubuntu:
  Invalid
Status in linux-azure source package in Jammy:
  Fix Released
Status in linux-azure source package in Lunar:
  Fix Committed

Bug description:
  [Description]

  On a VM on Azure with a Tesla gpu it was noticed that when removing
  the gpu from the pci the vm would crash. In case the nvidia drivers
  are loaded, the machine won't crash. Instead the removing process will
  hang and the machine will crash on reboot.

  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.

  For this case we have identified that removing commit [2] prevents the
  kernel crashes.

  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.

  [Test Case]

  On an Azure vm with a gpu :

  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove

  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.

  [Where things could go wrong]

  The commit to be reverted was included in a patchset to address lp
  bugs https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594

  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.

  [Other]

  Only Ubuntu azure kernels are affected :

  - Jammy 5.15
  - Lunar 6.2

  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.

  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2042568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042568] Re: Azure - Kernel crashes when removing gpu from pci

2024-02-15 Thread Launchpad Bug Tracker
This bug was fixed in the package linux-azure - 5.15.0-1056.64

---
linux-azure (5.15.0-1056.64) jammy; urgency=medium

  * jammy/linux-azure: 5.15.0-1056.64 -proposed tracker (LP: #2052545)

  * Azure: Fix regression introduced in LP: #2045069 (LP: #2052453)
- hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
- hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed

linux-azure (5.15.0-1055.63) jammy; urgency=medium

  * jammy/linux-azure: 5.15.0-1055.63 -proposed tracker (LP: #2048291)

  * Azure - Kernel crashes when removing gpu from pci (LP: #2042568)
- Revert "PCI: hv: Use async probing to reduce boot time"

  * Azure: mlx5e: Add support for PCI relaxed ordering (RO) for better
performance (LP: #2039208)
- RDMA/mlx5: Reorder calls to pcie_relaxed_ordering_enabled()
- RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write

  * Azure: Deprecate Netvsc and implement MANA direct (LP: #2045069)
- hv_netvsc: fix race of netvsc and VF register_netdevice
- hv_netvsc: Fix race of register_netdevice_notifier and VF register
- hv_netvsc: Mark VF as slave before exposing it to user-mode

  [ Ubuntu: 5.15.0-94.104 ]

  * jammy/linux: 5.15.0-94.104 -proposed tracker (LP: #2048777)
  * [SRU] Duplicate Device_dax ids Created and hence Probing is Failing.
(LP: #2028158)
- device-dax: Fix duplicate 'hmem' device registration
  * Add ODM driver f81604 usb-can (LP: #2045387)
- can: usb: f81604: add Fintek F81604 support
- [Config] updateconfigs for ODM drivers CONFIG_CAN_F81604
  * Add ODM driver gpio-m058ssan (LP: #2045386)
- SAUCE: ODM: gpio: add M058SSAN gpio driver
- [Config] updateconfigs for ODM drivers CONFIG_GPIO_M058SSAN
  * Add ODM driver rtc-pcf85263 (LP: #2045385)
- SAUCE: ODM: rtc: add PCF85263 RTC driver
- [Config] updateconfigs for ODM drivers CONFIG_RTC_DRV_PCF85263
  * AppArmor patch for mq-posix interface is missing in jammy (LP: #2045384)
- SAUCE: (no-up) apparmor: reserve mediation classes
- SAUCE: (no-up) apparmor: Add fine grained mediation of posix mqueues
  * Packaging resync (LP: #1786013)
- [Packaging] update annotations scripts

  [ Ubuntu: 5.15.0-93.103 ]

  * jammy/linux: 5.15.0-93.103 -proposed tracker (LP: #2048330)
  * Packaging resync (LP: #1786013)
- [Packaging] resync git-ubuntu-log
- [Packaging] resync update-dkms-versions helper
- [Packaging] remove helper scripts
- [Packaging] update annotations scripts
- debian/dkms-versions -- update from kernel-versions (main/2024.01.08)
  * Hotplugging SCSI disk in QEMU VM fails (LP: #2047382)
- Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
  * CVE-2023-6622
- netfilter: nf_tables: bail out on mismatching dynset and set expressions
  * CVE-2024-0193
- netfilter: nf_tables: skip set commit for deleted/destroyed sets
  * CVE-2023-6040
- netfilter: nf_tables: Reject tables of unsupported family
  * Patches needed for AmpereOne (arm64) (LP: #2044192)
- clocksource/arm_arch_timer: Add build-time guards for unhandled register
  accesses
- clocksource/drivers/arm_arch_timer: Drop CNT*_TVAL read accessors
- clocksource/drivers/arm_arch_timer: Extend write side of timer register
  accessors to u64
- clocksource/drivers/arm_arch_timer: Move system register timer programming
  over to CVAL
- clocksource/drivers/arm_arch_timer: Move drop _tval from erratum function
  names
- clocksource/drivers/arm_arch_timer: Fix MMIO base address vs callback
  ordering issue
- clocksource/drivers/arm_arch_timer: Move MMIO timer programming over to 
CVAL
- clocksource/drivers/arm_arch_timer: Advertise 56bit timer to the core code
- clocksource/drivers/arm_arch_timer: Work around broken CVAL 
implementations
- clocksource/drivers/arm_arch_timer: Remove any trace of the TVAL 
programming
  interface
- clocksource/drivers/arm_arch_timer: Drop unnecessary ISB on CVAL 
programming
- clocksource/drivers/arm_arch_timer: Fix masking for high freq counters
- clocksource/drivers/arch_arm_timer: Move workaround synchronisation around
  * Add quirk to disable i915 fastboot on B PC (LP: #2047630)
- SAUCE: i915: force disable fastboot quirk
  * Some machines can't pass the pm-graph test (LP: #2046217)
- wifi: iwlwifi: pcie: rescan bus if no parent
  * Sound: Add rtl quirk of M90-Gen5 (LP: #2046105)
- ALSA: hda/realtek: Enable headset on Lenovo M90 Gen5
  * linux tools packages for derived kernels refuse to install simultaneously
due to libcpupower name collision (LP: #2035971)
- [Packaging] Statically link libcpupower into cpupower tool
  * [Debian] autoreconstruct - Do not generate chmod -x for deleted  files
(LP: #2045562)
- [Debian] autoreconstruct - Do not generate chmod -x for deleted files
  * CVE-2023-6931
- perf/core: Add a new read format to get a number of lost samples
- 

[Kernel-packages] [Bug 2042568] Re: Azure - Kernel crashes when removing gpu from pci

2024-01-24 Thread Ioanna Alifieraki
# VERIFICATION JAMMY

This verification is using a jammy 5.15.0-1055 kernel on a focal machine.
To reproduce this issue a special azure instance is required and currently I 
have access to a focal vm.

root@jo-twosla:/home/ubuntu# uname -rv
5.15.0-1055-azure #63~20.04.1-Ubuntu SMP Thu Jan 18 15:30:26 UTC 2024


root@jo-twosla:/home/ubuntu# lspci
0001:00:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
3240:00:02.0 Ethernet controller: Mellanox Technologies MT27710 Family 
[ConnectX-4 Lx Virtual Function] (rev 80)

root@jo-twosla:/home/ubuntu# echo '1' >
/sys/bus/pci/devices/0001:00:00.0/remove

root@jo-twosla:/home/ubuntu# lspci
3240:00:02.0 Ethernet controller: Mellanox Technologies MT27710 Family 
[ConnectX-4 Lx Virtual Function] (rev 80)

The gpu is successfully removed without the vm crashing.

** Tags removed: verification-needed-jammy-linux-azure
** Tags added: verification-done-jammy-linux-azure

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2042568

Title:
  Azure - Kernel crashes when removing gpu from pci

Status in linux-azure package in Ubuntu:
  Invalid
Status in linux-azure source package in Jammy:
  Fix Committed
Status in linux-azure source package in Lunar:
  Fix Committed

Bug description:
  [Description]

  On a VM on Azure with a Tesla gpu it was noticed that when removing
  the gpu from the pci the vm would crash. In case the nvidia drivers
  are loaded, the machine won't crash. Instead the removing process will
  hang and the machine will crash on reboot.

  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.

  For this case we have identified that removing commit [2] prevents the
  kernel crashes.

  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.

  [Test Case]

  On an Azure vm with a gpu :

  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove

  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.

  [Where things could go wrong]

  The commit to be reverted was included in a patchset to address lp
  bugs https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594

  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.

  [Other]

  Only Ubuntu azure kernels are affected :

  - Jammy 5.15
  - Lunar 6.2

  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.

  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2042568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042568] Re: Azure - Kernel crashes when removing gpu from pci

2024-01-17 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-azure/5.15.0-1055.63
kernel in -proposed solves the problem. Please test the kernel and
update this bug with the results. If the problem is solved, change the
tag 'verification-needed-jammy-linux-azure' to 'verification-done-jammy-
linux-azure'. If the problem still exists, change the tag 'verification-
needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-azure-v2 
verification-needed-jammy-linux-azure

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2042568

Title:
  Azure - Kernel crashes when removing gpu from pci

Status in linux-azure package in Ubuntu:
  Invalid
Status in linux-azure source package in Jammy:
  Fix Committed
Status in linux-azure source package in Lunar:
  Fix Committed

Bug description:
  [Description]

  On a VM on Azure with a Tesla gpu it was noticed that when removing
  the gpu from the pci the vm would crash. In case the nvidia drivers
  are loaded, the machine won't crash. Instead the removing process will
  hang and the machine will crash on reboot.

  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.

  For this case we have identified that removing commit [2] prevents the
  kernel crashes.

  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.

  [Test Case]

  On an Azure vm with a gpu :

  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove

  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.

  [Where things could go wrong]

  The commit to be reverted was included in a patchset to address lp
  bugs https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594

  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.

  [Other]

  Only Ubuntu azure kernels are affected :

  - Jammy 5.15
  - Lunar 6.2

  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.

  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2042568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042568] Re: Azure - Kernel crashes when removing gpu from pci

2023-12-07 Thread Tim Gardner
** Changed in: linux-azure (Ubuntu Jammy)
   Status: Confirmed => Fix Committed

** Changed in: linux-azure (Ubuntu Lunar)
   Status: Confirmed => Fix Committed

** Changed in: linux-azure (Ubuntu)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2042568

Title:
  Azure - Kernel crashes when removing gpu from pci

Status in linux-azure package in Ubuntu:
  Invalid
Status in linux-azure source package in Jammy:
  Fix Committed
Status in linux-azure source package in Lunar:
  Fix Committed

Bug description:
  [Description]

  On a VM on Azure with a Tesla gpu it was noticed that when removing
  the gpu from the pci the vm would crash. In case the nvidia drivers
  are loaded, the machine won't crash. Instead the removing process will
  hang and the machine will crash on reboot.

  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.

  For this case we have identified that removing commit [2] prevents the
  kernel crashes.

  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.

  [Test Case]

  On an Azure vm with a gpu :

  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove

  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.

  [Where things could go wrong]

  The commit to be reverted was included in a patchset to address lp
  bugs https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594

  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.

  [Other]

  Only Ubuntu azure kernels are affected :

  - Jammy 5.15
  - Lunar 6.2

  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.

  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2042568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042568] Re: Azure - Kernel crashes when removing gpu from pci

2023-11-17 Thread Ioanna Alifieraki
It turns out that 6.2 is also affected but the bug appears a lot less
often, once every ~20 reboots.

** Changed in: linux-azure (Ubuntu Lunar)
   Status: Invalid => Confirmed

** Changed in: linux-azure (Ubuntu Jammy)
   Status: New => Confirmed

** Description changed:

  [Description]
  
  On a VM on Azure with a Tesla gpu it was noticed that when removing the
  gpu from the pci the vm would crash. In case the nvidia drivers are
  loaded, the machine won't crash. Instead the removing process will hang
  and the machine will crash on reboot.
  
  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.
  
  For this case we have identified that removing commit [2] prevents the
  kernel crashes.
  
  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.
  
  [Test Case]
  
  On an Azure vm with a gpu :
  
  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove
  
  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.
  
  [Where things could go wrong]
  
  The commit to be reverted was included in a patchset to address lp bugs
  https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594
  
  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.
  
  [Other]
  
  Only Ubuntu azure kernels are affected :
  
  - Jammy 5.15
+ - Lunar 6.2
  
  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.
  
  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2042568

Title:
  Azure - Kernel crashes when removing gpu from pci

Status in linux-azure package in Ubuntu:
  New
Status in linux-azure source package in Jammy:
  Confirmed
Status in linux-azure source package in Lunar:
  Confirmed

Bug description:
  [Description]

  On a VM on Azure with a Tesla gpu it was noticed that when removing
  the gpu from the pci the vm would crash. In case the nvidia drivers
  are loaded, the machine won't crash. Instead the removing process will
  hang and the machine will crash on reboot.

  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.

  For this case we have identified that removing commit [2] prevents the
  kernel crashes.

  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.

  [Test Case]

  On an Azure vm with a gpu :

  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove

  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.

  [Where things could go wrong]

  The commit to be reverted was included in a patchset to address lp
  bugs https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594

  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.

  [Other]

  Only Ubuntu azure kernels are affected :

  - Jammy 5.15
  - Lunar 6.2

  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.

  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2042568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042568] Re: Azure - Kernel crashes when removing gpu from pci

2023-11-02 Thread Ioanna Alifieraki
Upon further testing Lunar kernel 6.2 seems not be affected. I'll
investigate further to find out why.

** Changed in: linux-azure (Ubuntu Lunar)
   Status: New => Invalid

** Description changed:

  [Description]
  
  On a VM on Azure with a Tesla gpu it was noticed that when removing the
  gpu from the pci the vm would crash. In case the nvidia drivers are
  loaded, the machine won't crash. Instead the removing process will hang
  and the machine will crash on reboot.
  
  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.
  
  For this case we have identified that removing commit [2] prevents the
  kernel crashes.
  
  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.
  
  [Test Case]
  
  On an Azure vm with a gpu :
  
  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove
  
  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.
  
  [Where things could go wrong]
  
  The commit to be reverted was included in a patchset to address lp bugs
  https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594
  
  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.
  
  [Other]
  
  Only Ubuntu azure kernels are affected :
  
  - Jammy 5.15
- - Lunar 6.2
  
  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.
  
- 
- 
  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2042568

Title:
  Azure - Kernel crashes when removing gpu from pci

Status in linux-azure package in Ubuntu:
  New
Status in linux-azure source package in Jammy:
  New
Status in linux-azure source package in Lunar:
  Invalid

Bug description:
  [Description]

  On a VM on Azure with a Tesla gpu it was noticed that when removing
  the gpu from the pci the vm would crash. In case the nvidia drivers
  are loaded, the machine won't crash. Instead the removing process will
  hang and the machine will crash on reboot.

  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.

  For this case we have identified that removing commit [2] prevents the
  kernel crashes.

  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.

  [Test Case]

  On an Azure vm with a gpu :

  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove

  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.

  [Where things could go wrong]

  The commit to be reverted was included in a patchset to address lp
  bugs https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594

  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.

  [Other]

  Only Ubuntu azure kernels are affected :

  - Jammy 5.15

  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.

  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2042568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042568] Re: Azure - Kernel crashes when removing gpu from pci

2023-11-02 Thread Ioanna Alifieraki
** Description changed:

  [Description]
  
  On a VM on Azure with a Tesla gpu it was noticed that when removing the
  gpu from the pci the vm would crash. In case the nvidia drivers are
  loaded, the machine won't crash. Instead the removing process will hang
  and the machine will crash on reboot.
  
  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.
  
  For this case we have identified that removing commit [2] prevents the
  kernel crashes.
  
  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.
  
  [Test Case]
  
  On an Azure vm with a gpu :
  
  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove
  
  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.
  
  [Where things could go wrong]
  
+ The commit to be reverted was included in a patchset to address lp bugs
+ https://bugs.launchpad.net/bugs/2023071 and
+ https://bugs.launchpad.net/bugs/2023594
+ 
+ However this commit just reduces boot time and removing shall not introduce 
any regressions.
+ Side effects will be increase in the boot time.
+ 
  [Other]
+ 
+ Only Ubuntu azure kernels are affected :
+ 
+ - Jammy 5.15
+ - Lunar 6.2
+ 
+ Focal is also affected since it's using 5.15 kernel.
+ This commit does not appear in Mantic 6.5 kernel.
+ 
+ 
  
  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2042568

Title:
  Azure - Kernel crashes when removing gpu from pci

Status in linux-azure package in Ubuntu:
  New
Status in linux-azure source package in Jammy:
  New
Status in linux-azure source package in Lunar:
  New

Bug description:
  [Description]

  On a VM on Azure with a Tesla gpu it was noticed that when removing
  the gpu from the pci the vm would crash. In case the nvidia drivers
  are loaded, the machine won't crash. Instead the removing process will
  hang and the machine will crash on reboot.

  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.

  For this case we have identified that removing commit [2] prevents the
  kernel crashes.

  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.

  [Test Case]

  On an Azure vm with a gpu :

  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove

  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.

  [Where things could go wrong]

  The commit to be reverted was included in a patchset to address lp
  bugs https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594

  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.

  [Other]

  Only Ubuntu azure kernels are affected :

  - Jammy 5.15
  - Lunar 6.2

  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.


  
  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2042568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042568] Re: Azure - Kernel crashes when removing gpu from pci

2023-11-02 Thread Ioanna Alifieraki
** Description changed:

  [Description]
+ 
+ On a VM on Azure with a Tesla gpu it was noticed that when removing the
+ gpu from the pci the vm would crash. In case the nvidia drivers are
+ loaded, the machine won't crash. Instead the removing process will hang
+ and the machine will crash on reboot.
+ 
+ This is related to bug [1].
+ The bug reported in [1] regards another driver but the root cause is the same.
+ It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.
+ 
+ For this case we have identified that removing commit [2] prevents the
+ kernel crashes.
+ 
+ Azure has requested to revert this commit, at least for the time being.
+ This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.
  
  [Test Case]
  
+ On an Azure vm with a gpu :
+ 
+ # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove
+ 
+ where '0001:00:00.0' the pci address of the gpu.
+ The vm will crash.
  
  [Where things could go wrong]
  
+ [Other]
  
- [Other]
+ [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
+ [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2042568

Title:
  Azure - Kernel crashes when removing gpu from pci

Status in linux-azure package in Ubuntu:
  New
Status in linux-azure source package in Jammy:
  New
Status in linux-azure source package in Lunar:
  New

Bug description:
  [Description]

  On a VM on Azure with a Tesla gpu it was noticed that when removing
  the gpu from the pci the vm would crash. In case the nvidia drivers
  are loaded, the machine won't crash. Instead the removing process will
  hang and the machine will crash on reboot.

  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.

  For this case we have identified that removing commit [2] prevents the
  kernel crashes.

  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.

  [Test Case]

  On an Azure vm with a gpu :

  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove

  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.

  [Where things could go wrong]

  [Other]

  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50=75af0c10b3703400890d314d1d91d25294234a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2042568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp