Marius,
I tested your code with petsc-release on my mac laptop using np=2 cores. I 
first tested a small matrix data file successfully. Then I switch to your data 
file and run out of memory, likely due to the dense matrices B and X. I got an 
error "Your system has run out of application memory" from my laptop.

The sparse matrix A has size 42549 by 42549. Your code creates dense matrices B 
and X with the same size -- a huge memory requirement!
By replacing B and X with size 42549 by nrhs (nrhs =< 4000), I had the code run 
well with np=2. Note the error message you got
[23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably 
memory access out of range

The modified code I used is attached.
Hong
________________________________
From: Marius Buerkle <[email protected]>
Sent: Tuesday, October 27, 2020 10:01 PM
To: Zhang, Hong <[email protected]>
Cc: [email protected] <[email protected]>; Sherry Li 
<[email protected]>
Subject: Aw: Re: [petsc-users] superlu_dist segfault

Hi,

I recompiled PETSC with debug option, now I get a seg fault at a different 
position

[23]PETSC ERROR: 
------------------------------------------------------------------------
[23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably 
memory access out of range
[23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[23]PETSC ERROR: or see 
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[23]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to 
find memory corruption errors
[23]PETSC ERROR: likely location of problem given in stack below
[23]PETSC ERROR: ---------------------  Stack Frames 
------------------------------------
[23]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[23]PETSC ERROR:       INSTEAD the line number of the start of the function
[23]PETSC ERROR:       is given.
[23]PETSC ERROR: [23] SuperLU_DIST:pzgssvx line 242 
/home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[23]PETSC ERROR: [23] MatMatSolve_SuperLU_DIST line 211 
/home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[23]PETSC ERROR: [23] MatMatSolve line 3466 
/home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/interface/matrix.c
[23]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[23]PETSC ERROR: Signal received

I  made a small reproducer. The matrix is a bit too big so I cannot attach it 
directly to the email, but I put it in the cloud
https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw

Best,
Marius


Gesendet: Dienstag, 27. Oktober 2020 um 23:11 Uhr
Von: "Zhang, Hong" <[email protected]>
An: "Marius Buerkle" <[email protected]>, "[email protected]" 
<[email protected]>, "Sherry Li" <[email protected]>
Betreff: Re: [petsc-users] superlu_dist segfault
Marius,
It fails at the line 1075 in file 
/home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c
    if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * 
sizeof(doublecomplex))))     ABORT("Malloc fails for lsum[].");

We do not know what it means. You may use a debugger to check the values of the 
variables involved.
I'm cc'ing Sherry (superlu_dist developer), or you may send us a stand-alone 
short code that reproduce the error. We can help on its investigation.
Hong


________________________________
From: petsc-users <[email protected]> on behalf of Marius Buerkle 
<[email protected]>
Sent: Tuesday, October 27, 2020 8:46 AM
To: [email protected] <[email protected]>
Subject: [petsc-users] superlu_dist segfault

Hi,

When using MatMatSolve with superlu_dist I get a segmentation fault:

Malloc fails for lsum[]. at line 1075 in file 
/home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c

The matrix size is not particular big and I am using the petsc release branch 
and superlu_dist is v6.3.0 I think.

Best,
Marius
#include <petsc.h>
#include <mpi.h>

static char help[] = "superlu_dist seg fault test.\n\n";

int main(int argc,char **args)
{
  Mat  A, B, X, F;
  KSP  ksp;
  PC  pc;
  PetscInt  inode, nnodes;
  PetscViewer  v_file;
  PetscScalar  pp;
  PetscErrorCode  ierr;

  ierr = PetscInitialize(&argc, &args, NULL, help);  CHKERRQ(ierr);

  MPI_Comm_size(PETSC_COMM_WORLD, &nnodes);
  MPI_Comm_rank(PETSC_COMM_WORLD, &inode);

  MatCreate(PETSC_COMM_WORLD, &A);
  MatSetType(A, MATAIJ);

  PetscViewerBinaryOpen(PETSC_COMM_WORLD,"mat_a.dat",FILE_MODE_READ,&v_file);
  ierr = MatLoad(A, v_file);
  printf("MatLoad %i\n", ierr);  CHKERRQ(ierr);
  PetscViewerDestroy(&v_file);

  PetscInt    M,N,m,n,rstart,rend,cols[2],i,nrhs=4000;
  PetscScalar vals[2];
  ierr = MatGetSize(A,&M,&N);CHKERRQ(ierr);
  ierr = MatGetLocalSize(A,&m,&n);CHKERRQ(ierr);
  printf("A: %d, %d, %d %d\n",m,n,M,N);

  ierr = 
MatCreateDense(PETSC_COMM_WORLD,m,PETSC_DECIDE,PETSC_DECIDE,nrhs,NULL,&B);CHKERRQ(ierr);
  ierr = 
MatCreateDense(PETSC_COMM_WORLD,m,PETSC_DECIDE,PETSC_DECIDE,nrhs,NULL,&X);CHKERRQ(ierr);
  //MatZeroEntries(B);
  //MatZeroEntries(X);

  ierr = MatGetOwnershipRange(A,&rstart,&rend);CHKERRQ(ierr);
  for (i=rstart; i<rend; i++) {
    cols[0] = 0; cols[1] = 1; vals[0] = 1.0; vals[1] = 1.0;
    ierr = MatSetValues(B,1,&i,2,cols,vals,INSERT_VALUES);CHKERRQ(ierr);
  }
  ierr = MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
  ierr = MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
  ierr = MatAssemblyBegin(X,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
  ierr = MatAssemblyEnd(X,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);

  //pp = 1.0;
  //MatShift(B, pp);
  //MatView(B,PETSC_VIEWER_STDOUT_WORLD);

  KSPCreate(PETSC_COMM_WORLD, &ksp);
  KSPSetOperators(ksp, A, A);

  KSPSetType(ksp, KSPPREONLY);
  KSPGetPC(ksp, &pc);

  PCSetType(pc, PCLU);
  ierr = PCFactorSetMatSolverType(pc,MATSOLVERSUPERLU_DIST);CHKERRQ(ierr);

  ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
  ierr = KSPSetUp(ksp);CHKERRQ(ierr);

  ierr = PCFactorGetMatrix(pc, &F);  CHKERRQ(ierr);
  printf("PCFactorGetMatrix %i\n", ierr);

  ierr = MatMatSolve(F, B, X);CHKERRQ(ierr);
  printf("MatMatSolve %i\n", ierr);

  ierr = MatDestroy(&A);CHKERRQ(ierr);
  ierr = MatDestroy(&B);CHKERRQ(ierr);
  ierr = MatDestroy(&X);CHKERRQ(ierr);
  ierr = KSPDestroy(&ksp);CHKERRQ(ierr);
  ierr = PetscFinalize();
  return ierr;
}

Reply via email to