Simone,
I have fixed PETSc-dev
http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html to support loading
and storing any sized dense matrices in native format.
I've attached a simple test problem, once you get it installed could you
please test it and let me know if you have any problems.
Thanks
Barry
[see attached file: ex1.c]
On Jun 3, 2011, at 11:30 AM, Barry Smith wrote:
>
> Simone,
>
> This is because we are trying to send messages too long for MPI to
> handle. This is a problem for MPI for two reasons
>
> 1) MPI "count" arguments are always int, when we use 64 bit PetscInt (because
> of the --with-64-bit-indices PetscInt becomes long long int) this means we
> "may" be passing values too large as count values to MPI and because C/C++
> automatically castes long long int arguments to int it ends up passing
> garbage values to the MPI libraries. Now I say "may" because this is only a
> problem if a count happens to be so large it won't fit in an int.
>
> 2) Even if the "count" values passed to MPI are correct int values, we've
> found that none of the MPI implementations handle "counts" correctly when
> they are within a factor of 4 or 8 of the largest value allowed in an int.
> This is because the MPI implementations improperly do things like convert
> from count to byte size by multiplying by sizeof(the type being passed) and
> store the result in an int (where it won't fit). We've harassed the MPICH
> folks about this but they consider it a low priority to fix.
>
> In a few places in PETSc where it uses MPI calls we have started to be very
> careful and make sure that we only use PetscMPIInt as count arguments to MPI
> calls and explicitly check that we can caste from PetscInt to PetscMPIInt and
> generate an error if the result won't fit. We also replace a single call to
> MPI_Send() and MPI_Recv() with our own routines MPILong_Send() and
> MPILong_Recv() that make several calls to MPI_Send() and MPI_Recv() each
> sufficiently small enough for MPI to handle.
> For example in MatView_MPIAIJ_Binary() we've updated the code to handle
> absurdly large matrices that cannot use the MPI calls directly.
>
> I will update the viewer and loader for MPIDense matrices to work correctly,
> but you will have to test it in petsc-dev (not petsc-3.1) Also, I have no
> machines with enough memory to do proper testing so you will need to test the
> code for me.
>
>
> Barry
>
>
>
>
> On Jun 3, 2011, at 9:31 AM, Simone Re wrote:
>
>> Dear Experts,
>> I'm facing an issue when saving an MPI dense matrix.
>>
>> My matrix has:
>>
>> - 5085 rows
>>
>> - 737352 columns
>> and the crash occurs when I run the program using 12 CPUs (for instance with
>> 16 CPUs everything is fine).
>>
>> I built my program using both mvapich2 and Intel MPI 4 and it crashes in
>> both cases.
>>
>> When I run my original program built against Intel MPI 4 I get the following.
>>
>> [4]PETSC ERROR: MatView_MPIDense_Binary() line 658 in
>> src/mat/impls/dense/mpi/mpidense.c
>> [4]PETSC ERROR: MatView_MPIDense() line 780 in
>> src/mat/impls/dense/mpi/mpidense.c
>> [4]PETSC ERROR: MatView() line 717 in src/mat/interface/matrix.c
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [4]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>> [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [4]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[4]PETSC
>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find
>> memory corruption errors
>> [4]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
>> run
>> [4]PETSC ERROR: to get more information on the crash.
>> [4]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [4]PETSC ERROR: Signal received!
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [4]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37
>> CST 2010
>> [4]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [4]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [4]PETSC ERROR: See docs/index.html for manual pages.
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>> ...
>>
>> Unfortunately, when I run the sample program attached, I get the crash but I
>> don't get the same error message.
>> I've attached also:
>>
>> - the error I get from the sample program (built using mvapich2)
>>
>> - configure.log
>>
>> - the command line I used to invoke the sample program
>>
>> Thanks and regards,
>> Simone Re
>>
>> Simone Re
>> Team Leader
>> Integrated EM Center of Excellence
>> WesternGeco GeoSolutions
>> via Celeste Clericetti 42/A
>> 20133 Milano - Italy
>> +39 02 . 266 . 279 . 246 (direct)
>> +39 02 . 266 . 279 . 279 (fax)
>> sre at slb.com<mailto:sre at slb.com>
>>
>>
>> Dear Experts,
>>
>> I?m facing an issue when saving an MPI dense matrix.
>>
>>
>>
>> My matrix has:
>>
>> - 5085 rows
>>
>> - 737352 columns
>>
>> and the crash occurs when I run the program using 12 CPUs (for instance with
>> 16 CPUs everything is fine).
>>
>>
>>
>> I built my program using both mvapich2 and Intel MPI 4 and it crashes in
>> both cases.
>>
>>
>>
>> When I run my original program built against Intel MPI 4 I get the following.
>>
>>
>>
>> [4]PETSC ERROR: MatView_MPIDense_Binary() line 658 in
>> src/mat/impls/dense/mpi/mpidense.c
>>
>> [4]PETSC ERROR: MatView_MPIDense() line 780 in
>> src/mat/impls/dense/mpi/mpidense.c
>>
>> [4]PETSC ERROR: MatView() line 717 in src/mat/interface/matrix.c
>>
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>>
>> [4]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>>
>> [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>
>> [4]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[4]PETSC
>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find
>> memory corruption errors
>>
>> [4]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
>> run
>>
>> [4]PETSC ERROR: to get more information on the crash.
>>
>> [4]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>>
>> [4]PETSC ERROR: Signal received!
>>
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>>
>> [4]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37
>> CST 2010
>>
>> [4]PETSC ERROR: See docs/changes/index.html for recent updates.
>>
>> [4]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>
>> [4]PETSC ERROR: See docs/index.html for manual pages.
>>
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>>
>> ?
>>
>>
>>
>> Unfortunately, when I run the sample program attached, I get the crash but I
>> don?t get the same error message.
>>
>> I?ve attached also:
>>
>> - the error I get from the sample program (built using mvapich2)
>>
>> - configure.log
>>
>> - the command line I used to invoke the sample program
>>
>>
>>
>> Thanks and regards,
>>
>> Simone Re
>>
>>
>>
>> Simone Re
>>
>> Team Leader
>>
>> Integrated EM Center of Excellence
>>
>>
>> WesternGeco GeoSolutions
>>
>> via Celeste Clericetti 42/A
>>
>> 20133 Milano - Italy
>>
>>
>> +39 02 . 266 . 279 . 246 (direct)
>>
>> +39 02 . 266 . 279 . 279 (fax)
>>
>> sre at slb.com
>>
>>
>>
>> <for_petsc_team.tar.bz2>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex1.c
Type: application/octet-stream
Size: 1689 bytes
Desc: not available
URL:
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110603/feac48c6/attachment.obj>