On 2022/3/28 09:38, Corey Minyard wrote:
On Mon, Mar 28, 2022 at 12:47:41AM +0800, Chen Guanqiao wrote:
At present, a scenario has been found that there are too many ipmi messages in a
short period of time, and a large number of users and messages are blocked in
the ipmi modules, resulting in a large amount of system memory being occupied by
ipmi, and ipmi communication always fails.

Frequent calls ipmi and failure of hardware communication will cause this
exception. And ipmi has no way to detect and perceive this problem, therefore
it is impossible to located and perceived online.

Hmm.  So you have an application that just keeps sending IPMI messages
and not waiting for responses?  I think the first order of business
would be to fix your applications to not do that.

Hi, Corey

Actually, The patch just provides a way to located and perceived this problem online: display number of users and messages. How to solve the problem gracefully, I haven't fully thought about it. To cleanup msgs queue is one of method for administrator.

Because the memory consumption of the module is counted in the consumption of the kernel, most of the time, the administrator does not know the state of ipmi, so it is impossible to guess where the memory goes.

Only when they tried to execute 'rmmod ipmi' did they find out: oh ,the memory is in ipmi.

The ipmi driver will eventually clean things out, but the timeouts are
pretty long.  In the 5 second range per message.

However, as you say, there are no limits on users or messages, and that
is perhaps a problem.  I mean, only root can send IPMI message, and root
can do a lot more harm than that.  But it's probably bad in principle.
Nobody has ever reported this problem before.
If the bmc communication of the device is abnormal, for example, the hardware is blocked, and the monitoring program repeatedly checks the bmc.

The scenario is often seen in automated monitoring tool.

Of course, this problem is a bit rare, one hundred out of ten thousand machines, 1% probability.

Anyway, a better solution for the kernel side of things, I think, would
be to add limits on the number of users and the number of messages per
user.  That's more inline with what other kernel things do.  I know of
nothing else in the kernel that does what you are proposing.

The precondition for add limits, is that people known that ipmi has too many users and messages cause problems, this patch is to let administrator known that.

In addition, different machines have different limit, My server my block 700,000 messages and it's fine, and my NAS pc went to OOM when it probably blocked for 10,000 messages. So, to limit the number of users and messages, can wait until we have accumulated some online experience?


Does that make sense?

-corey


thanks
--

Chen Guanqiao

This patch provides a method to view the current number of users and messages in
ipmi, and introduce a simple interface to clear the message queue.

Chen Guanqiao (3):
   ipmi: Get the number of user through sysfs
   ipmi: Get the number of message through sysfs
   ipmi: add a interface to clean message queue in sysfs

  drivers/char/ipmi/ipmi_msghandler.c | 159 ++++++++++++++++++++++++++++
  1 file changed, 159 insertions(+)

--
2.25.1




_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to