> On Aug 24, 2020, at 2:34 PM, Jed Brown <[email protected]> wrote:
> 
> I'm thinking of something such as writing floating point data into the return 
> address, which would be unaligned/garbage.

  Ok, my patch will detect this. This is what I was talking about, messing up 
the BLAS arguments which are the addresses of arrays.

  Valgrind is by far the preferred approach.

  Barry

  Another feature we could add to the malloc checking is when a SEGV or BUS 
error is encountered and we catch it we should run the PetscMallocVerify() and 
check our memory for corruption reporting any we find.



> 
> Reproducing under Valgrind would help a lot.  Perhaps it's possible to 
> checkpoint such that the breakage can be reproduced more quickly?
> 
> Barry Smith <[email protected]> writes:
> 
>> https://en.wikipedia.org/wiki/Bus_error 
>> <https://en.wikipedia.org/wiki/Bus_error>
>> 
>> But perhaps not true for Intel? 
>> 
>> 
>> 
>>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley <[email protected]> wrote:
>>> 
>>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>>> On Aug 24, 2020, at 12:39 PM, Jed Brown <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Barry Smith <[email protected] <mailto:[email protected]>> writes:
>>>> 
>>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Barry Smith <[email protected] <mailto:[email protected]>> writes:
>>>>>> 
>>>>>>> So if a BLAS errors with SIGBUS then it is always an input error of 
>>>>>>> just not proper double/complex alignment? Or some other very strange 
>>>>>>> thing?
>>>>>> 
>>>>>> I would suspect memory corruption.
>>>>> 
>>>>> 
>>>>> Corruption meaning what specifically?
>>>>> 
>>>>> The routines crashing are dgemv which only take double precision arrays, 
>>>>> regardless of what garbage is in those arrays i don't think there can be 
>>>>> BUS errors resulting. They don't take integer arrays whose corruption 
>>>>> could result in bad indexing and then BUS errors. 
>>>>> 
>>>>> So then it can only be corruption of the pointers passed in, correct?
>>>> 
>>>> Such as those pointers pointing into data on the stack with incorrect 
>>>> sizes.
>>> 
>>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS?
>>> 
>>> My understanding was that roughly memory errors in the heap are SEGV and 
>>> memory errors on the stack are SIGBUS. Is that not true?
>>> 
>>>   Matt
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Reply via email to