Ashwin,

the valgrind logs clearly indicate you are trying to access some memory that was already free'd


for example

[1,0]<stderr>:==4683== Invalid read of size 4
[1,0]<stderr>:==4683== at 0x795DC2: __src_input_MOD_organize_input (src_input.f90:2318) [1,0]<stderr>:==4683== Address 0xb4001d0 is 0 bytes inside a block of size 24 free'd [1,0]<stderr>:==4683== by 0x63F3690: free_NC_var (in /usr/local/lib/libnetcdf.so.11.0.3)

[1,0]<stderr>:==4683== by 0x63BB431: nc_close (in /usr/local/lib/libnetcdf.so.11.0.3) [1,0]<stderr>:==4683== by 0x435A9F: __io_utilities_MOD_close_file (io_utilities.f90:995)
[1,0]<stderr>:==4683==  Block was alloc'd at
[1,0]<stderr>:==4683== by 0x63F378C: new_x_NC_var (in /usr/local/lib/libnetcdf.so.11.0.3) [1,0]<stderr>:==4683== by 0x63BAF85: nc_open (in /usr/local/lib/libnetcdf.so.11.0.3)
[1,0]<stderr>:==4683==    by 0x547E6F6: nf_open_ (nf_control.F90:189)

so the double-free error could be a side effect of this.

at this stage, i suggest you fix your application, and see if it resolves your issue.
(e.g. there is no need to try an other MPI library and/or version for now)

Cheers,

Gilles

On 6/18/2017 2:41 PM, ashwin .D wrote:
Hello Gilles,
               First of all I am extremely grateful for this communication from 
you on a weekend and that too few hours after I
posted my email. Well I am not sure I can go on posting log files as you 
rightly point out that MPI is not the source of the
problem. Still I have enclosed the valgrind log files as you requested. I have 
downloaded the MPICH packages as you suggested
and I am going to install them shortly. But before I do that I think I have a 
clue on the source of my problem(double free or corruption) and I would really 
appreciate
your advice.
As I mentioned before COSMO has been compiled with mpif90 for shared memory 
usage and with gfortran for sequential access.
But it is dependent on a lot of external third party software such as zlib, 
libcurl, hdf5, netcdf and netcdf-fortran. When I
looked at the config.log of those packages all of them had  been compiled with 
gfortran and gcc and some cases g++ with
enable-shared option. So my question then is could that be a source of the 
"mismatch" ?

In other words I would have to recompile all those packages with mpif90 and 
mpicc and then try another test. At the very
least there should be no mixing of gcc/gfortran compiled code with mpif90 
compiled code. Comments ?
Best regards,
Ashwin.

>Ashwin,

>did you try to run your app with a MPICH-based library (mvapich,
>IntelMPI or even stock mpich) ?
>or did you try with Open MPI v1.10 ?
>the stacktrace does not indicate the double free occurs in MPI...
>it seems you ran valgrind vs a shell and not your binary.
>assuming your mpirun command is
>mpirun lmparbin_all
>i suggest you try again with
>mpirun --tag-output valgrind lmparbin_all
>that will generate one valgrind log per task, but these are prefixed
>so it should be easier to figure out what is going wrong

>Cheers,

>Gilles


On Sun, Jun 18, 2017 at 11:41 AM, ashwin .D <winas...@gmail.com 
<mailto:winas...@gmail.com>> wrote:
> There is a sequential version of the same program COSMO (no reference to
> MPI) that I can run without any problems. Of course it takes a lot longer to
> complete. Now I also ran valgrind (not sure whether that is useful or not)
> and I have enclosed the logs.

On Sun, Jun 18, 2017 at 8:11 AM, ashwin .D <winas...@gmail.com <mailto:winas...@gmail.com>> wrote:

    There is a sequential version of the same program COSMO (no
    reference to MPI) that I can run without any problems. Of course
    it takes a lot longer to complete. Now I also ran valgrind (not
    sure whether that is useful or not) and I have enclosed the logs.

    On Sat, Jun 17, 2017 at 7:20 PM, ashwin .D <winas...@gmail.com
    <mailto:winas...@gmail.com>> wrote:

        Hello Gilles,
                           I am enclosing all the information you
        requested.

        1)  as an attachment I enclose the log file
        2) I did rebuild OpenMPI 2.1.1 with the --enable-debug feature
        and I reinstalled it /usr/lib/local.
        I ran all the examples in the examples directory. All passed
        except oshmem_strided_puts where I got this message

        [[48654,1],0][pshmem_iput.c:70:pshmem_short_iput] Target PE #1
        is not in valid range
        
--------------------------------------------------------------------------
        SHMEM_ABORT was invoked on rank 0 (pid 13409,
        host=a-Vostro-3800) with errorcode -1.
        
--------------------------------------------------------------------------


        3) I deleted all old OpenMPI versions under /usr/local/lib.
        4) I am using the COSMO weather model -
        http://www.cosmo-model.org/ to run simulations
        The support staff claim they have seen no errors with a
        similar setup. They use

        1) gfortran 4.8.5
        2) OpenMPI 1.10.1

        The only difference is I use OpenMPI 2.1.1.

        5) I did try this option as well mpirun --mca btl tcp,self -np
        4 cosmo. and I got the same error as in the mpi_logs file

        6) Regarding compiler and linking options on Ubuntu 16.04

        mpif90 --showme:compile and --showme:link give me the options
        for compiling and linking.

        Here are the options from my makefile

        -pthread -lmpi_usempi -lmpi_mpifh -lmpi for linking

        7) I have a 64 bit OS.

        Well I think I have responded all of your questions. In any
        case I have not please let me know and I will respond ASAP.
        The only thing I have not done is look at /usr/local/include.
        I saw some old OpenMPI files there. If those need to be
        deleted I will do after I hear from you.

        Best regards,
        Ashwin.





_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to