Bug#703356: megasas: Failed to alloc kernel SGL buffer for IOCTL (ref.#688198)

2013-06-10 Thread Jon Schewe
I noticed in the last reply to this bug that the MegaRAID Storage Manager
is suspect. I'm running Ubuntu with a 3.5.0-32 kernel and see this same
behavior when using the MegaCli64 command line tool. I run this tool
through cron each hour to grab the logs from the RAID controller and put
them into syslog. Everything was fine for a day or so and then now
everytime I run the tool I an error message about the SGL buffer.

I believe this appeared in the latest kernel update for Ubuntu. Perhaps a
simliar patch was applied to both Debian and Ubuntu recently?


-- 
http://mtu.net/~jpschewe


Bug#703356: megasas: Failed to alloc kernel SGL buffer for IOCTL (ref.#688198)

2013-03-19 Thread Bjørn Mork
Jean-Francois Chevrette jf.cr...@gmail.com writes:

 Package: src:linux
 Version: 3.2.39-2
 Severity: important

 (first time submiting to a bug report, sorry if I missed anything)

 We are still affected by bug #688198

Yes, I see that it was closed after applying a related bugfix.  But as I
noted in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=688198#25 the
reported bug would not be fixed by this after all.  The fixed bug was
real, but unrelated to the reported one.

 We have other seemingly identical servers (hardware  software) and not
 all of them have this problem.

 Is there anything else I can provide to help?

The message indicates a memory allocation problem related to sending
management commands from userspace to the driver/controller.  Management
commands are e.g. requests from smartctl, raid monitoring etc.

All data transferred between these userspace applications and the
controller must be copied to/from dma-coherent buffers for transfer to
the controller, and it is the allocation of these buffers which fails.
Either because the requests are so bogus (too many or too big) that they
just cannot be serviced, or because the system is out of memory in the
appropriate pool.

Maybe we can get some ideas about why this fails if you describe the
conditions you experience the problem under.  I believe the fact that
you only see this on some of otherwise identical servers is very
interesting. If we could find some pattern here, then that would help.
Is there some special monitoring application running on the failing
servers?  Are there other devices in these servers which may have
drivers eating memory?

I can't, but maybe the Debian kernel gurus can read something out of 

 /proc/slabinfo 
 /proc/buddyinfo
 /proc/pagetypeinfo

Comparing those files on a failing server and a non-failing server would
certainly be interesting.



Bjørn


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87zjxz241l@nemi.mork.no



Bug#703356: megasas: Failed to alloc kernel SGL buffer for IOCTL (ref.#688198)

2013-03-19 Thread Bjørn Mork
Jean-Francois Chevrette jf.cr...@gmail.com writes:
 On Tue, Mar 19, 2013 at 4:21 AM, Bjørn Mork bj...@mork.no wrote:

 Maybe we can get some ideas about why this fails if you describe the
 conditions you experience the problem under.

 This server is running Xen 4.1 and a single VM. Nothing fancy there.
 It's also running DRBD to replicate a device to another server. It's
 also running a few userland tools for monitoring (nagios) and graphing
 (munin). Other than that nothing fancy.

 Nagios is the one calling MegaCli to monitor the array consistency.

 One thing to note is that after a server reboot, the MegaCli tool
 works fine for a while. This does sounds like there's leak somewhere.

 I just found out that this server is also running a service called
 MegaRAID Storage Manager which is a tool provided by LSI to manage the
 array through a java GUI. Maybe this tool is somehow causing this
 problem.

That sounds like a very likely suspect, yes.

 Stopping it didn't solve the problem. I'll try disabling the
 tool and reboot without ever starting it to see if the problem occurs
 again.

Good.  If that works then we probably should find out what this tool
does to trigger the problem, so that it can be handled properly by the
driver.


Bjørn


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87mwtz1qma@nemi.mork.no