Public bug reported:

[IMPACT]

Commit 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU 
user->release_barrier)
pushes the removal of an ipmi_user into the system's workqueue.

Whenever an ipmi_user struct is about to be removed it is scheduled as a
work on the system's workqueue to guarantee the free operation won't be
executed in atomic context. When the work is executed the
free_user_work() function is invoked which frees the ipmi_user.

When ipmi_msghandler module is removed in cleanup_ipmi() function, there is no 
check if there are any pending works to be executed.
Therefore, there is a potential race condition :
An ipmi_user is scheduled for removal and shortly after to remove the 
ipmi_msghandler module.
If the scheduled work delays execution for any reason and the module is removed 
first, then when the work is executed the pages of free_user_work() are gone 
and the system crashes with the following :

BUG: unable to handle page fault for address: ffffffffc05c3450
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 635420e067 P4D 635420e067 PUD 6354210067 PMD 4711e51067 PTE 0
Oops: 0010 [#1] SMP PTI
CPU: 19 PID: 29646 Comm: kworker/19:1 Kdump: loaded Not tainted 
5.4.0-77-generic #86~18.04.1-Ubuntu
Hardware name: Ciara Technologies ORION RS610-G4-DTH4S/MR91-FS1-Y9, BIOS F29 
05/23/2019
Workqueue: events 0xffffffffc05c3450
RIP: 0010:0xffffffffc05c3450
Code: Bad RIP value.
RSP: 0018:ffffb721333c3e88 EFLAGS: 00010286
RAX: ffffffffc05c3450 RBX: ffff92a95f56a740 RCX: ffffb7221cfd14e8
RDX: 0000000000000001 RSI: ffff92616040d4b0 RDI: ffffb7221cf404e0
RBP: ffffb721333c3ec0 R08: 000073746e657665 R09: 8080808080808080
R10: ffffb721333c3de0 R11: fefefefefefefeff R12: ffff92a95f570700
R13: ffff92a0a40ece40 R14: ffffb7221cf404e0 R15: 0ffff92a95f57070
FS: 0000000000000000(0000) GS:ffff92a95f540000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc05c3426 CR3: 00000081e9bfc005 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
? process_one_work+0x20f/0x400
worker_thread+0x34/0x410
kthread+0x121/0x140
? process_one_work+0x400/0x400
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: xt_REDIRECT xt_owner ipt_rpfilter xt_CT xt_multiport xt_set 
ip_set_hash_ip veth xt_statistic ipt_REJECT
... megaraid_sas ahci libahci wmi [last unloaded: ipmi_msghandler]
CR2: ffffffffc05c3450

[TEST CASE]

The user who reported the issue can reproduce reliably by stopping the ipmi 
related services and then removing the ipmi modules.
I could reproduce the issue only when turning the normal 'work' to delayed work.

[WHERE PROBLEMS COULD OCCUR]

TBD

[OTHER]

Upstream is affected too, working on a patch to address this.

** Affects: linux (Ubuntu)
     Importance: Medium
     Assignee: Ioanna Alifieraki (joalif)
         Status: In Progress

** Affects: linux (Ubuntu Focal)
     Importance: Medium
     Assignee: Ioanna Alifieraki (joalif)
         Status: Confirmed

** Affects: linux (Ubuntu Hirsute)
     Importance: Medium
     Assignee: Ioanna Alifieraki (joalif)
         Status: Confirmed

** Affects: linux (Ubuntu Impish)
     Importance: Medium
     Assignee: Ioanna Alifieraki (joalif)
         Status: Confirmed

** Affects: linux (Ubuntu Jammy)
     Importance: Medium
     Assignee: Ioanna Alifieraki (joalif)
         Status: In Progress

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Hirsute)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Impish)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Hirsute)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Impish)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Focal)
       Status: New => Confirmed

** Changed in: linux (Ubuntu Hirsute)
       Status: New => Confirmed

** Changed in: linux (Ubuntu Impish)
       Status: New => Confirmed

** Changed in: linux (Ubuntu Jammy)
       Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
     Assignee: (unassigned) => Ioanna Alifieraki (joalif)

** Changed in: linux (Ubuntu Impish)
     Assignee: (unassigned) => Ioanna Alifieraki (joalif)

** Changed in: linux (Ubuntu Hirsute)
     Assignee: (unassigned) => Ioanna Alifieraki (joalif)

** Changed in: linux (Ubuntu Focal)
     Assignee: (unassigned) => Ioanna Alifieraki (joalif)

** Description changed:

  [IMPACT]
  
  Commit 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU 
user->release_barrier)
  pushes the removal of an ipmi_user into the system's workqueue.
  
- Whenever an ipmi_user struct is about to be removed it is scheduled as a work 
on
- the system's workqueue to guarantee the free operation won't be executed in 
atomic context.
- When the work is executed the free_user_work() function is invoked which 
frees the ipmi_user.
+ Whenever an ipmi_user struct is about to be removed it is scheduled as a
+ work on the system's workqueue to guarantee the free operation won't be
+ executed in atomic context. When the work is executed the
+ free_user_work() function is invoked which frees the ipmi_user.
  
- When ipmi_msghandler module is removed in cleanup_ipmi() function, there is 
no 
- check if there are any pending works to be executed.
- Therefore, there is a potential race condition : 
- An ipmi_user is scheduled for removal and shortly to remove ipmi_msghandler 
module.
- If the scheduled work delays execution for any reason and the module is 
removed 
- first then when the work is executed the pages of free_user_work() are gone 
and
- the system crashes with the following :
+ When ipmi_msghandler module is removed in cleanup_ipmi() function, there is 
no check if there are any pending works to be executed.
+ Therefore, there is a potential race condition :
+ An ipmi_user is scheduled for removal and shortly after to remove the 
ipmi_msghandler module.
+ If the scheduled work delays execution for any reason and the module is 
removed first, then when the work is executed the pages of free_user_work() are 
gone and the system crashes with the following :
  
  BUG: unable to handle page fault for address: ffffffffc05c3450
  #PF: supervisor instruction fetch in kernel mode
  #PF: error_code(0x0010) - not-present page
  PGD 635420e067 P4D 635420e067 PUD 6354210067 PMD 4711e51067 PTE 0
  Oops: 0010 [#1] SMP PTI
  CPU: 19 PID: 29646 Comm: kworker/19:1 Kdump: loaded Not tainted 
5.4.0-77-generic #86~18.04.1-Ubuntu
  Hardware name: Ciara Technologies ORION RS610-G4-DTH4S/MR91-FS1-Y9, BIOS F29 
05/23/2019
  Workqueue: events 0xffffffffc05c3450
  RIP: 0010:0xffffffffc05c3450
  Code: Bad RIP value.
  RSP: 0018:ffffb721333c3e88 EFLAGS: 00010286
  RAX: ffffffffc05c3450 RBX: ffff92a95f56a740 RCX: ffffb7221cfd14e8
  RDX: 0000000000000001 RSI: ffff92616040d4b0 RDI: ffffb7221cf404e0
  RBP: ffffb721333c3ec0 R08: 000073746e657665 R09: 8080808080808080
  R10: ffffb721333c3de0 R11: fefefefefefefeff R12: ffff92a95f570700
  R13: ffff92a0a40ece40 R14: ffffb7221cf404e0 R15: 0ffff92a95f57070
  FS: 0000000000000000(0000) GS:ffff92a95f540000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffc05c3426 CR3: 00000081e9bfc005 CR4: 00000000007606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
  ? process_one_work+0x20f/0x400
  worker_thread+0x34/0x410
  kthread+0x121/0x140
  ? process_one_work+0x400/0x400
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
- Modules linked in: xt_REDIRECT xt_owner ipt_rpfilter xt_CT xt_multiport 
xt_set ip_set_hash_ip veth xt_statistic ipt_REJECT 
+ Modules linked in: xt_REDIRECT xt_owner ipt_rpfilter xt_CT xt_multiport 
xt_set ip_set_hash_ip veth xt_statistic ipt_REJECT
  ... megaraid_sas ahci libahci wmi [last unloaded: ipmi_msghandler]
  CR2: ffffffffc05c3450
  
  [TEST CASE]
  
+ The user who reported the issue can reproduce reliably by stopping the ipmi 
related services and then removing the ipmi modules.
+ I could reproduce the issue only when turning the normal 'work' to delayed 
work.
+ 
  [WHERE PROBLEMS COULD OCCUR]
+ 
+ TBD
  
  [OTHER]
  
  Upstream is affected too, working on a patch to address this.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1950666

Title:
  system crash when removing ipmi_msghandler module

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Focal:
  Confirmed
Status in linux source package in Hirsute:
  Confirmed
Status in linux source package in Impish:
  Confirmed
Status in linux source package in Jammy:
  In Progress

Bug description:
  [IMPACT]

  Commit 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU 
user->release_barrier)
  pushes the removal of an ipmi_user into the system's workqueue.

  Whenever an ipmi_user struct is about to be removed it is scheduled as
  a work on the system's workqueue to guarantee the free operation won't
  be executed in atomic context. When the work is executed the
  free_user_work() function is invoked which frees the ipmi_user.

  When ipmi_msghandler module is removed in cleanup_ipmi() function, there is 
no check if there are any pending works to be executed.
  Therefore, there is a potential race condition :
  An ipmi_user is scheduled for removal and shortly after to remove the 
ipmi_msghandler module.
  If the scheduled work delays execution for any reason and the module is 
removed first, then when the work is executed the pages of free_user_work() are 
gone and the system crashes with the following :

  BUG: unable to handle page fault for address: ffffffffc05c3450
  #PF: supervisor instruction fetch in kernel mode
  #PF: error_code(0x0010) - not-present page
  PGD 635420e067 P4D 635420e067 PUD 6354210067 PMD 4711e51067 PTE 0
  Oops: 0010 [#1] SMP PTI
  CPU: 19 PID: 29646 Comm: kworker/19:1 Kdump: loaded Not tainted 
5.4.0-77-generic #86~18.04.1-Ubuntu
  Hardware name: Ciara Technologies ORION RS610-G4-DTH4S/MR91-FS1-Y9, BIOS F29 
05/23/2019
  Workqueue: events 0xffffffffc05c3450
  RIP: 0010:0xffffffffc05c3450
  Code: Bad RIP value.
  RSP: 0018:ffffb721333c3e88 EFLAGS: 00010286
  RAX: ffffffffc05c3450 RBX: ffff92a95f56a740 RCX: ffffb7221cfd14e8
  RDX: 0000000000000001 RSI: ffff92616040d4b0 RDI: ffffb7221cf404e0
  RBP: ffffb721333c3ec0 R08: 000073746e657665 R09: 8080808080808080
  R10: ffffb721333c3de0 R11: fefefefefefefeff R12: ffff92a95f570700
  R13: ffff92a0a40ece40 R14: ffffb7221cf404e0 R15: 0ffff92a95f57070
  FS: 0000000000000000(0000) GS:ffff92a95f540000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffc05c3426 CR3: 00000081e9bfc005 CR4: 00000000007606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
  ? process_one_work+0x20f/0x400
  worker_thread+0x34/0x410
  kthread+0x121/0x140
  ? process_one_work+0x400/0x400
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
  Modules linked in: xt_REDIRECT xt_owner ipt_rpfilter xt_CT xt_multiport 
xt_set ip_set_hash_ip veth xt_statistic ipt_REJECT
  ... megaraid_sas ahci libahci wmi [last unloaded: ipmi_msghandler]
  CR2: ffffffffc05c3450

  [TEST CASE]

  The user who reported the issue can reproduce reliably by stopping the ipmi 
related services and then removing the ipmi modules.
  I could reproduce the issue only when turning the normal 'work' to delayed 
work.

  [WHERE PROBLEMS COULD OCCUR]

  TBD

  [OTHER]

  Upstream is affected too, working on a patch to address this.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1950666/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to