From: Jinhui Guo <[email protected]>

[ Upstream commit 936750fdba4c45e13bbd17f261bb140dd55f5e93 ]

The race window between __scan_channels() and deliver_response() causes
the parameters of some channels to be set to 0.

1.[CPUA] __scan_channels() issues an IPMI request and waits with
         wait_event() until all channels have been scanned.
         wait_event() internally calls might_sleep(), which might
         yield the CPU. (Moreover, an interrupt can preempt
         wait_event() and force the task to yield the CPU.)
2.[CPUB] deliver_response() is invoked when the CPU receives the
         IPMI response. After processing a IPMI response,
         deliver_response() directly assigns intf->wchannels to
         intf->channel_list and sets intf->channels_ready to true.
         However, not all channels are actually ready for use.
3.[CPUA] Since intf->channels_ready is already true, wait_event()
         never enters __wait_event(). __scan_channels() immediately
         clears intf->null_user_handler and exits.
4.[CPUB] Once intf->null_user_handler is set to NULL, deliver_response()
         ignores further IPMI responses, leaving the remaining
         channels zero-initialized and unusable.

CPUA                             CPUB
-------------------------------  -----------------------------
__scan_channels()
 intf->null_user_handler
       = channel_handler;
 send_channel_info_cmd(intf,
       0);
 wait_event(intf->waitq,
       intf->channels_ready);
  do {
   might_sleep();
                                 deliver_response()
                                  channel_handler()
                                   intf->channel_list =
                                         intf->wchannels + set;
                                   intf->channels_ready = true;
                                   send_channel_info_cmd(intf,
                                         intf->curr_channel);
   if (condition)
    break;
   __wait_event(wq_head,
          condition);
  } while(0)
 intf->null_user_handler
       = NULL;
                                 deliver_response()
                                  if (!msg->user)
                                   if (intf->null_user_handler)
                                    rv = -EINVAL;
                                  return rv;
-------------------------------  -----------------------------

Fix the race between __scan_channels() and deliver_response() by
deferring both the assignment intf->channel_list = intf->wchannels
and the flag intf->channels_ready = true until all channels have
been successfully scanned or until the IPMI request has failed.

Signed-off-by: Jinhui Guo <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Corey Minyard <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---

LLM Generated explanations, may be completely bogus:

## Summary Analysis

### 1. BUG MECHANISM

This commit fixes a **race condition** between two IPMI functions:

- `__scan_channels()`: Initiates channel scanning, sets
  `null_user_handler = channel_handler`, then waits for `channels_ready`
- `channel_handler()` (called via `deliver_response()`): Processes IPMI
  responses for each channel

**The Bug** (introduced in commit 31b0b0730ad2a from September 2017):

In `channel_handler()`, after processing channel N's response (where N <
IPMI_MAX_CHANNELS-1):
1. Lines 3420-3421 set `channels_ready = true` **prematurely**
2. Then sends request for channel N+1
3. `__scan_channels()` may see `channels_ready == true` and exit the
   wait loop early
4. `__scan_channels()` sets `null_user_handler = NULL`
5. Responses for channels N+1, N+2, etc. arrive but are **discarded**
   because `null_user_handler` is NULL
6. Remaining channels are **zero-initialized and unusable**

### 2. THE FIX

The fix simply **removes 2 lines** that prematurely set `channels_ready
= true` in the `else` branch. After the fix, `channels_ready = true` is
only set when:
- All channels have been scanned (`curr_channel >= IPMI_MAX_CHANNELS`),
  OR
- An error occurs (`rv != 0`)

### 3. CLASSIFICATION

| Criteria | Assessment |
|----------|------------|
| Bug type | Race condition causing data corruption (zero-init channels)
|
| Impact | IPMI channels become unusable on affected systems |
| Size | 2 lines removed - minimal and surgical |
| Risk | Very LOW - only removes premature assignments |
| Subsystem | IPMI - used for server management |

### 4. STABLE BACKPORT CRITERIA

| Criterion | Status |
|-----------|--------|
| Obviously correct | ✅ Yes - simply delays setting flag until the right
time |
| Fixes real bug | ✅ Yes - race causes channels to be zero-initialized |
| User impacting | ✅ Yes - affects IPMI hardware management |
| Small and contained | ✅ Yes - 2 lines in single file |
| No new features | ✅ Correct - pure bug fix |
| No API changes | ✅ Correct - internal change only |

### 5. SIGNALS

**Positive signals:**
- Fixes a real race condition with clear cause and effect
- Minimal, surgical fix (2 lines removed)
- Bug has existed since 2017 (31b0b0730ad2a) - affects all current LTS
  kernels
- IPMI maintainer (Corey Minyard) signed off
- Detailed commit message explains the race with CPU timing diagram

**Missing signals:**
- No explicit `Cc: [email protected]` tag
- No explicit `Fixes: 31b0b0730ad2a` tag (though it should have one)
- No Tested-by/Reviewed-by tags

### 6. RISK vs BENEFIT

- **Benefit**: Fixes a race condition that makes IPMI channels unusable
  on servers
- **Risk**: Extremely low - the fix only removes code that ran at the
  wrong time; the correct code paths for setting `channels_ready` are
  untouched
- **Affected users**: Server/data center users relying on IPMI for
  hardware management

### 7. DEPENDENCIES

The fix is standalone and doesn't depend on other commits. The affected
code has existed unchanged since 2017, so it should apply cleanly to all
active stable kernels.

### CONCLUSION

This is a clear backport candidate. The commit fixes a real race
condition that causes IPMI channels to become unusable. The fix is
minimal (2 lines removed), obviously correct (simply delays a flag until
the right time), and carries virtually no regression risk. IPMI is
critical infrastructure for server management, making this fix important
for stable users.

**YES**

 drivers/char/ipmi/ipmi_msghandler.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_msghandler.c 
b/drivers/char/ipmi/ipmi_msghandler.c
index 3700ab4eba3e7..d3f84deee4513 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -3417,8 +3417,6 @@ channel_handler(struct ipmi_smi *intf, struct 
ipmi_recv_msg *msg)
                        intf->channels_ready = true;
                        wake_up(&intf->waitq);
                } else {
-                       intf->channel_list = intf->wchannels + set;
-                       intf->channels_ready = true;
                        rv = send_channel_info_cmd(intf, intf->curr_channel);
                }
 
-- 
2.51.0



_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to