On Sun, Oct 09, 2022 at 05:18:09PM +0800, Zhang Yuchen wrote: > If you continue to access and send messages at a high frequency (once > every 55s) when the IPMI is disconnected, messages will accumulate in > intf->[hp_]xmit_msg. If it lasts long enough, it takes up a lot of > memory.
This is queued for 6.2. Thanks. I already had the other two patches queued. -corey > > The reason is that if IPMI is disconnected, each message will be set to > IDLE after it returns to HOSED through IDLE->ERROR0->HOSED. The next > message goes through the same process when it comes in. This process > needs to wait for IBF_TIMEOUT * (MAX_ERROR_RETRIES + 1) = 55s. > > Each message takes 55S to destroy. This results in a continuous increase > in memory. > > I find that if I wait 5 seconds after the first message fails, the > status changes to ERROR0 in smi_timeout(). The next message will return > the error code IPMI_NOT_IN_MY_STATE_ERR directly without wait. > > This is more in line with our needs. > > So instead of setting each message state to IDLE after it reaches the > state HOSED, set state to ERROR0. > > After testing, the problem has been solved, no matter how many > consecutive sends, will not cause continuous memory growth. It also > returns to normal immediately after the IPMI is restored. > > In addition, the HOSED state should also count as invalid. So the HOSED > is removed from the invalid judgment in start_kcs_transaction(). > > The verification operations are as follows: > > 1. Use BPF to record the ipmi_alloc/free_smi_msg(). > > $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc > %p\n",retval);} kprobe:free_recv_msg {printf("free %p\n",arg0)}' > > 2. Exec `date; time for x in $(seq 1 2); do ipmitool mc info; done`. > 3. Record the output of `time` and when free all msgs. > > Before: > > `time` takes 120s, This is because `ipmitool mc info` send 4 msgs and > waits only 15 seconds for each message. Last msg is free after 440s. > > $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc > %p\n",retval);} kprobe:free_recv_msg {printf("free %p\n",arg0)}' > Oct 05 11:40:55 Attaching 2 probes... > Oct 05 11:41:12 alloc 0xffff9558a05f0c00 > Oct 05 11:41:27 alloc 0xffff9558a05f1a00 > Oct 05 11:41:42 alloc 0xffff9558a05f0000 > Oct 05 11:41:57 alloc 0xffff9558a05f1400 > Oct 05 11:42:07 free 0xffff9558a05f0c00 > Oct 05 11:42:07 alloc 0xffff9558a05f7000 > Oct 05 11:42:22 alloc 0xffff9558a05f2a00 > Oct 05 11:42:37 alloc 0xffff9558a05f5a00 > Oct 05 11:42:52 alloc 0xffff9558a05f3a00 > Oct 05 11:43:02 free 0xffff9558a05f1a00 > Oct 05 11:43:57 free 0xffff9558a05f0000 > Oct 05 11:44:52 free 0xffff9558a05f1400 > Oct 05 11:45:47 free 0xffff9558a05f7000 > Oct 05 11:46:42 free 0xffff9558a05f2a00 > Oct 05 11:47:37 free 0xffff9558a05f5a00 > Oct 05 11:48:32 free 0xffff9558a05f3a00 > > $ root@dc00-pb003-t106-n078:~# date;time for x in $(seq 1 2); do > ipmitool mc info; done > > Wed Oct 5 11:41:12 CST 2022 > No data available > Get Device ID command failed > No data available > No data available > No valid response received > Get Device ID command failed: Unspecified error > No data available > Get Device ID command failed > No data available > No data available > No valid response received > No data available > Get Device ID command failed > > real 1m55.052s > user 0m0.001s > sys 0m0.001s > > After: > > `time` takes 55s, all msgs is returned and free after 55s. > > $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc > %p\n",retval);} kprobe:free_recv_msg {printf("free %p\n",arg0)}' > > Oct 07 16:30:35 Attaching 2 probes... > Oct 07 16:30:45 alloc 0xffff955943aa9800 > Oct 07 16:31:00 alloc 0xffff955943aacc00 > Oct 07 16:31:15 alloc 0xffff955943aa8c00 > Oct 07 16:31:30 alloc 0xffff955943aaf600 > Oct 07 16:31:40 free 0xffff955943aa9800 > Oct 07 16:31:40 free 0xffff955943aacc00 > Oct 07 16:31:40 free 0xffff955943aa8c00 > Oct 07 16:31:40 free 0xffff955943aaf600 > Oct 07 16:31:40 alloc 0xffff9558ec8f7e00 > Oct 07 16:31:40 free 0xffff9558ec8f7e00 > Oct 07 16:31:40 alloc 0xffff9558ec8f7800 > Oct 07 16:31:40 free 0xffff9558ec8f7800 > Oct 07 16:31:40 alloc 0xffff9558ec8f7e00 > Oct 07 16:31:40 free 0xffff9558ec8f7e00 > Oct 07 16:31:40 alloc 0xffff9558ec8f7800 > Oct 07 16:31:40 free 0xffff9558ec8f7800 > > root@dc00-pb003-t106-n078:~# date;time for x in $(seq 1 2); do > ipmitool mc info; done > Fri Oct 7 16:30:45 CST 2022 > No data available > Get Device ID command failed > No data available > No data available > No valid response received > Get Device ID command failed: Unspecified error > Get Device ID command failed: 0xd5 Command not supported in present state > Get Device ID command failed: Command not supported in present state > > real 0m55.038s > user 0m0.001s > sys 0m0.001s > > Signed-off-by: Zhang Yuchen <zhangyuchen....@bytedance.com> > --- > drivers/char/ipmi/ipmi_kcs_sm.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/drivers/char/ipmi/ipmi_kcs_sm.c b/drivers/char/ipmi/ipmi_kcs_sm.c > index efda90dcf5b3..ecfcb50302f6 100644 > --- a/drivers/char/ipmi/ipmi_kcs_sm.c > +++ b/drivers/char/ipmi/ipmi_kcs_sm.c > @@ -122,10 +122,10 @@ struct si_sm_data { > unsigned long error0_timeout; > }; > > -static unsigned int init_kcs_data(struct si_sm_data *kcs, > - struct si_sm_io *io) > +static unsigned int init_kcs_data_with_state(struct si_sm_data *kcs, > + struct si_sm_io *io, enum kcs_states state) > { > - kcs->state = KCS_IDLE; > + kcs->state = state; > kcs->io = io; > kcs->write_pos = 0; > kcs->write_count = 0; > @@ -140,6 +140,12 @@ static unsigned int init_kcs_data(struct si_sm_data *kcs, > return 2; > } > > +static unsigned int init_kcs_data(struct si_sm_data *kcs, > + struct si_sm_io *io) > +{ > + return init_kcs_data_with_state(kcs, io, KCS_IDLE); > +} > + > static inline unsigned char read_status(struct si_sm_data *kcs) > { > return kcs->io->inputb(kcs->io, 1); > @@ -270,7 +276,7 @@ static int start_kcs_transaction(struct si_sm_data *kcs, > unsigned char *data, > if (size > MAX_KCS_WRITE_SIZE) > return IPMI_REQ_LEN_EXCEEDED_ERR; > > - if ((kcs->state != KCS_IDLE) && (kcs->state != KCS_HOSED)) { > + if (kcs->state != KCS_IDLE) { > dev_warn(kcs->io->dev, "KCS in invalid state %d\n", kcs->state); > return IPMI_NOT_IN_MY_STATE_ERR; > } > @@ -495,7 +501,7 @@ static enum si_sm_result kcs_event(struct si_sm_data > *kcs, long time) > } > > if (kcs->state == KCS_HOSED) { > - init_kcs_data(kcs, kcs->io); > + init_kcs_data_with_state(kcs, kcs->io, KCS_ERROR0); > return SI_SM_HOSED; > } > > -- > 2.30.2 > _______________________________________________ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer