Simone,

    I have fixed PETSc-dev 
http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html to support loading 
and storing any sized dense matrices in native format. 

    I've attached a simple test problem, once you get it installed could you 
please test it and let me know if you have any problems.

    Thanks

    Barry

[see attached file: ex1.c]


On Jun 3, 2011, at 11:30 AM, Barry Smith wrote:

> 
>  Simone,
> 
>     This is because we are trying to send messages too long for MPI to 
> handle. This is a problem for MPI for two reasons
> 
> 1) MPI "count" arguments are always int, when we use 64 bit PetscInt (because 
> of the --with-64-bit-indices PetscInt becomes long long int) this means we 
> "may" be passing values too large as count values to MPI and because C/C++ 
> automatically castes long long int arguments to int it ends up passing 
> garbage values to the MPI libraries.  Now I say "may" because this is only a 
> problem if a count happens to be so large it won't fit in an int.
> 
> 2) Even if the "count" values passed to MPI are correct int values, we've 
> found that none of the MPI implementations handle "counts" correctly when 
> they are within a factor of 4 or 8 of the largest value allowed in an int. 
> This is because the MPI implementations improperly do things like convert 
> from count to byte size by multiplying by sizeof(the type being passed) and 
> store the result in an int (where it won't fit). We've harassed the MPICH 
> folks about this but they consider it a low priority to fix.
> 
>  In a few places in PETSc where it uses MPI calls we have started to be very 
> careful and make sure that we only use PetscMPIInt as count arguments to MPI 
> calls and explicitly check that we can caste from PetscInt to PetscMPIInt and 
> generate an error if the result won't fit. We also replace a single call to 
> MPI_Send() and MPI_Recv() with our own routines MPILong_Send() and 
> MPILong_Recv() that make several calls to MPI_Send() and MPI_Recv() each 
> sufficiently small enough for MPI to handle.
> For example in MatView_MPIAIJ_Binary() we've updated the code to handle 
> absurdly large matrices that cannot use the MPI calls directly.
> 
>  I will update the viewer and loader for MPIDense matrices to work correctly, 
> but you will have to test it in petsc-dev (not petsc-3.1) Also, I have no 
> machines with enough memory to do proper testing so you will need to test the 
> code for me.
> 
> 
>   Barry
> 
> 
> 
> 
> On Jun 3, 2011, at 9:31 AM, Simone Re wrote:
> 
>> Dear Experts,
>>               I'm facing an issue when saving an MPI dense matrix.
>> 
>> My matrix has:
>> 
>> -          5085 rows
>> 
>> -          737352 columns
>> and the crash occurs when I run the program using 12 CPUs (for instance with 
>> 16 CPUs everything is fine).
>> 
>> I built my program using both mvapich2 and Intel MPI 4 and it crashes in 
>> both cases.
>> 
>> When I run my original program built against Intel MPI 4 I get the following.
>> 
>> [4]PETSC ERROR: MatView_MPIDense_Binary() line 658 in 
>> src/mat/impls/dense/mpi/mpidense.c
>> [4]PETSC ERROR: MatView_MPIDense() line 780 in 
>> src/mat/impls/dense/mpi/mpidense.c
>> [4]PETSC ERROR: MatView() line 717 in src/mat/interface/matrix.c
>> [4]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> [4]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>> probably memory access out of range
>> [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [4]PETSC ERROR: or see 
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[4]PETSC
>>  ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find 
>> memory corruption errors
>> [4]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and 
>> run
>> [4]PETSC ERROR: to get more information on the crash.
>> [4]PETSC ERROR: --------------------- Error Message 
>> ------------------------------------
>> [4]PETSC ERROR: Signal received!
>> [4]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> [4]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37 
>> CST 2010
>> [4]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [4]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [4]PETSC ERROR: See docs/index.html for manual pages.
>> [4]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> ...
>> 
>> Unfortunately, when I run the sample program attached, I get the crash but I 
>> don't get the same error message.
>> I've attached also:
>> 
>> -           the error I get from the sample program (built using mvapich2)
>> 
>> -          configure.log
>> 
>> -          the command line I used to invoke the sample program
>> 
>> Thanks and regards,
>>               Simone Re
>> 
>> Simone Re
>> Team Leader
>> Integrated EM Center of Excellence
>> WesternGeco GeoSolutions
>> via Celeste Clericetti 42/A
>> 20133 Milano - Italy
>> +39 02 . 266 . 279 . 246   (direct)
>> +39 02 . 266 . 279 . 279   (fax)
>> sre at slb.com<mailto:sre at slb.com>
>> 
>> 
>> Dear Experts,
>> 
>>                I?m facing an issue when saving an MPI dense matrix.
>> 
>> 
>> 
>> My matrix has:
>> 
>> -          5085 rows
>> 
>> -          737352 columns
>> 
>> and the crash occurs when I run the program using 12 CPUs (for instance with 
>> 16 CPUs everything is fine).
>> 
>> 
>> 
>> I built my program using both mvapich2 and Intel MPI 4 and it crashes in 
>> both cases.
>> 
>> 
>> 
>> When I run my original program built against Intel MPI 4 I get the following.
>> 
>> 
>> 
>> [4]PETSC ERROR: MatView_MPIDense_Binary() line 658 in 
>> src/mat/impls/dense/mpi/mpidense.c
>> 
>> [4]PETSC ERROR: MatView_MPIDense() line 780 in 
>> src/mat/impls/dense/mpi/mpidense.c
>> 
>> [4]PETSC ERROR: MatView() line 717 in src/mat/interface/matrix.c
>> 
>> [4]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> 
>> [4]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>> probably memory access out of range
>> 
>> [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> 
>> [4]PETSC ERROR: or see 
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[4]PETSC
>>  ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find 
>> memory corruption errors
>> 
>> [4]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and 
>> run
>> 
>> [4]PETSC ERROR: to get more information on the crash.
>> 
>> [4]PETSC ERROR: --------------------- Error Message 
>> ------------------------------------
>> 
>> [4]PETSC ERROR: Signal received!
>> 
>> [4]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> 
>> [4]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37 
>> CST 2010
>> 
>> [4]PETSC ERROR: See docs/changes/index.html for recent updates.
>> 
>> [4]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> 
>> [4]PETSC ERROR: See docs/index.html for manual pages.
>> 
>> [4]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> 
>> ?
>> 
>> 
>> 
>> Unfortunately, when I run the sample program attached, I get the crash but I 
>> don?t get the same error message.
>> 
>> I?ve attached also:
>> 
>> -           the error I get from the sample program (built using mvapich2)
>> 
>> -          configure.log
>> 
>> -          the command line I used to invoke the sample program
>> 
>> 
>> 
>> Thanks and regards,
>> 
>>                Simone Re
>> 
>> 
>> 
>> Simone Re
>> 
>> Team Leader
>> 
>> Integrated EM Center of Excellence
>> 
>> 
>> WesternGeco GeoSolutions
>> 
>> via Celeste Clericetti 42/A
>> 
>> 20133 Milano - Italy
>> 
>> 
>> +39 02 . 266 . 279 . 246   (direct)
>> 
>> +39 02 . 266 . 279 . 279   (fax)
>> 
>> sre at slb.com
>> 
>> 
>> 
>> <for_petsc_team.tar.bz2>
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex1.c
Type: application/octet-stream
Size: 1689 bytes
Desc: not available
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110603/feac48c6/attachment.obj>

Reply via email to