On Mon, 24 Aug 2020, Barry Smith wrote:

> 
> 
> > On Aug 24, 2020, at 12:31 PM, Jed Brown <[email protected]> wrote:
> > 
> > Barry Smith <[email protected]> writes:
> > 
> >>  So if a BLAS errors with SIGBUS then it is always an input error of just 
> >> not proper double/complex alignment? Or some other very strange thing?
> > 
> > I would suspect memory corruption.
> 
> 
>   Corruption meaning what specifically?
> 
>   The routines crashing are dgemv which only take double precision arrays, 
> regardless of what garbage is in those arrays i don't think there can be BUS 
> errors resulting. They don't take integer arrays whose corruption could 
> result in bad indexing and then BUS errors. 
> 
>   So then it can only be corruption of the pointers passed in, correct?

My wild guess here is - some hardware is misbehaving [on severe
load/overheating/insufficient-coolring]. Some errors should be
detected/corrected by ECC RAM - but perhaps not all failures get
detected?

Satish

Reply via email to