On Mon, Aug 05, 2019 at 11:18:50AM -0700, Tejun Heo wrote: > Hello, Corey. > > On Thu, Aug 01, 2019 at 12:40:02PM -0500, Corey Minyard wrote: > > I spent some time looking at this. Without the patch, I > > measured around 3.5ms to send/receive a get device ID message > > and uses 100% of the CPU on a core. > > > > I measured your patch, it slowed it down to around 10.5ms > > per message, which is not good. Though it did just use a > > few percent of the core. > > > > I wrote some code to auto-adjust the timer. It settled on > > a delay around 35us, which gave 4.7ms per message, which is > > probably acceptable, and used around 40% of the CPU. If > > I use any timeout (even a 0-10us range) it won't go below > > 4ms per message. > > Out of curiosity, what makes 4.7ms okay and 10.5ms not? At least for > the use case we have in the fleet (sensor reading mostly), the less > disruptive the better as long as things don't timeout and fail.
Well, when you are loading firmware and it takes 10 minutes at max speed, taking 20-30 minutes is a lot worse. It's not reading sensors, which would be fine, it's tranferring large chunks of data. > > > The process is running at nice 19 priority, so it won't > > have a significant effect on other processes from a priority > > point of view. It may still be hitting the scheduler locks > > pretty hard, though. But I played with things quite a bit > > And power. Imagine multi six digit number of machines burning a full > core just because of this busy loop to read temperature sensors some > msecs faster. > > > and the behavior or the management controller is too > > erratic to set a really good timeout. Maybe other ones > > are better, don't know. > > > > One other option we have is that the driver has something > > called "maintenance mode". If it detect that you have > > reset the management controller or are issuing firmware > > commands, it will modify timeout behavior. It can also > > be activated manually. I could also make it switch to > > just calling schedule instead of delaying when in that > > mode. > > Yeah, whatever which makes the common-case behavior avoid busy looping > would work. Ok, it's queued in linux-next now (and has been for a few days). I'll get it into the next kernel release (and I just noticed a spelling error and fixed it). -corey > > > The right thing it do is complain bitterly to vendors who > > build hardware that has to be polled. But besides that, > > For sure, but there already are a lot of machines with this thing and > it'll take multiple years for them to retire so... > > > I'm thinking the maintenance mode is the thing to do. > > It will also change behavior if you reset the management > > controller, but only for 30 seconds or so. Does that > > work? > > Yeah, sounds good to me. > > Thnaks. > > -- > tejun _______________________________________________ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer