On Sat, Jan 11, 2014 at 12:09 PM, Edward Ned Harvey (lopser) <
lop...@nedharvey.com> wrote:

> We have this in a dozen machines.  For several days, they all work fine.
>  And then we get, from a random machine each time, cron failure email
> "/bin/date: command not found" or /usr/bin/test, or /usr/bin/hostid, or any
> random one of those commands.  The machine will have to reboot in order to
> make the problem go away.  I tracked it down to I/O error recorded in the
> system log.  The only explanation can be a fault in the storage backend,
> plus caching to make it keep failing on subsequent calls.
>

This smells to me a whole lot like the kind of stuff we see when Linux's
dentry cache gets messed up. It seems to happen a lot with some network
filesystems --- apparently due to incorrect kernel assumptions that lead to
it not flushing its cache and trying again when it probably should. (It
usually leaves me wondering why anyone trusts Linux in production....)

-- 
brandon s allbery kf8nh                               sine nomine associates
allber...@gmail.com                                  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net
_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to