I looked into the code of PetscLLCondensedCreate_Scalable:
...
ierr = PetscMalloc1(2*(nlnk_max+2),lnk);CHKERRQ(ierr);
...
and just for fun, I tried this:
#include <iostream>
int main() {
int a=1741445953; // my number of unknowns...
int b=2*(a+2);
unsigned long int c = b;
std::cout << " a: " << a << " b: " << b << " c: " << c <<std::endl;
return 0;
}
and it gives:
a: 1741445953 b: -812075386 c: 18446744072897476230
and in the PETSc error log I got this:
...
[0]PETSC ERROR: Memory requested 18446744070461249536
...
It really looks like there is a int somewhere that held the overflow,
then have been a transformed into a unsigned long...
Thanks,
Eric
On 16/11/15 01:26 PM, Eric Chamberland wrote:
Barry,
I can't launch the code again and retrieve other informations, since I
am not allowed to do so: the cluster have around ~780 nodes and I got a
very special permission to reserve 530 of them...
So the best I can do is to give you the backtrace PETSc gave me... :/
(see the first post with the backtrace:
http://lists.mcs.anl.gov/pipermail/petsc-users/2015-November/027644.html)
And until today, all smaller meshes with the same solver succeeded to
complete... (I went up to 219 millions of unknowns on 64 nodes).
I understand then that there could be some use of PetscInt64 in the
actual code that would help fix problems like the one I got. I found it
is a big challenge to track down all occurrence of this kind of overflow
in the code, due to the size of the systems you have to have to
reproduce this problem....
Eric
On 16/11/15 12:40 PM, Barry Smith wrote:
Eric,
The behavior you get with bizarre integers and a crash is not the
behavior we want. We would like to detect these overflows
appropriately. If you can track through the error and determine the
location where the overflow occurs then we would gladly put in
additional checks and use of PetscInt64 to handle these things better.
So let us know the exact cause and we'll improve the code.
Barry
On Nov 16, 2015, at 11:11 AM, Eric Chamberland
<[email protected]> wrote:
On 16/11/15 10:42 AM, Matthew Knepley wrote:
Sometimes when we do not have exact counts, we need to overestimate
sizes. This is especially true
in sparse MatMat.
Ok... so, to be sure, I am correct if I say that recompiling petsc with
"--with-64-bit-indices" is the only solution to my problem?
I mean, no other fixes exist for this overestimation in a more recent
release of petsc, like putting the result in a "long int" instead?
Thanks,
Eric