On Mon, 24 Aug 2020, Barry Smith wrote: > > > > On Aug 24, 2020, at 12:31 PM, Jed Brown <[email protected]> wrote: > > > > Barry Smith <[email protected]> writes: > > > >> So if a BLAS errors with SIGBUS then it is always an input error of just > >> not proper double/complex alignment? Or some other very strange thing? > > > > I would suspect memory corruption. > > > Corruption meaning what specifically? > > The routines crashing are dgemv which only take double precision arrays, > regardless of what garbage is in those arrays i don't think there can be BUS > errors resulting. They don't take integer arrays whose corruption could > result in bad indexing and then BUS errors. > > So then it can only be corruption of the pointers passed in, correct?
My wild guess here is - some hardware is misbehaving [on severe load/overheating/insufficient-coolring]. Some errors should be detected/corrected by ECC RAM - but perhaps not all failures get detected? Satish
