Hey.

So, my server has a fairly boring LSI JSOD SAS controller running the
set of SATA disks, and it has taken recently (and newly) to throwing a
whole pile of errors at me:

mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, Code={SATA NCQ
Fail All Commands After Error}, SubCode(0x0000)
mptbase: ioc0: LogInfo(0x31181000): Originator={PL}, Code={IO
Cancelled Due to Recieve Error}, SubCode(0x1000)

I tend to get clusters of like commands, and timing varies a bunch; at
least some people report SMART commands triggering these errors, but I
can't track any down or anything.

Annoyingly, they don't seem to indicate a specific unit is
responsible, there is no useful documentation to decoding the meaning
of the message, and no utilities to indicate what on earth the root
cause is.  Worse, no other visible errors from the system, so
presumably whatever it is does not propagate up to the kernel enough
to trigger any higher level problems...

So, can anyone advise on how to track down the root cause of these
problems?  This is a production system, so I don't especially want to
take it offline or anything, and I can't see any specific externally
visible problems this is causing...

Daniel
-- 
⎋ Puppet Labs Developer – http://puppetlabs.com
✉ Daniel Pittman <[email protected]>
✆ Contact me via gtalk, email, or phone: +1 (503) 893-2285
♲ Made with 100 percent post-consumer electrons
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to