Satish, I checked with the Intel support team and they told me that "Fortran does not allow what it calls "recursive I/O" (except for internal files) - once you start an I/O operation on a unit no other operation on that unit may begin". So the use of directive !$OMP CRITICAL is necesssary. The reason behind I get that recursive I/O in my case linking with PETSC and I don't without, I guess it would be that the number of linking programs is too big and maybe the I/O operation becomes slower... but everything is solved! Thanks for worrying about the problem! I already have my code parallelized with OpenMP and I make use of PETSC without further problems.
Thanks! Adrian. 2018-03-02 20:25 GMT+01:00 Satish Balay <ba...@mcs.anl.gov>: > I just tried your test code with gfortran [without petsc] - and I > don't understand it. Does gfortran not support this openmp usage? > > [tried gfortran 4.8.4 and 7.3.1] > > balay@es^/sandbox/balay/omp $ gfortran -fopenmp -c hellocount > hellocount.F90 hellocount_main.F90 > balay@es^/sandbox/balay/omp $ gfortran -fopenmp -c hellocount.F90 > balay@es^/sandbox/balay/omp $ gfortran -fopenmp hellocount_main.F90 > hellocount.o > balay@es^/sandbox/balay/omp $ ./a.out > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 11 out of 32 > Hello from 14 out of 32 > Hello from 14 out of 32 > Hello from 14 out of 32 > Hello from 14 out of 32 > Hello from 14 out of 32 > Hello from 14 out of 32 > Hello from 14 out of 32 > > ifort compiled test appears to behave correctly > > balay@es^/sandbox/balay/omp $ ifort -qopenmp -c hellocount.F90 > balay@es^/sandbox/balay/omp $ ifort -qopenmp hellocount_main.F90 > hellocount.o > balay@es^/sandbox/balay/omp $ ./a.out |sort -n > Hello from 0 out of 32 > Hello from 10 out of 32 > Hello from 11 out of 32 > Hello from 12 out of 32 > Hello from 13 out of 32 > Hello from 14 out of 32 > Hello from 15 out of 32 > Hello from 16 out of 32 > Hello from 17 out of 32 > Hello from 18 out of 32 > Hello from 19 out of 32 > Hello from 1 out of 32 > Hello from 20 out of 32 > Hello from 21 out of 32 > Hello from 22 out of 32 > Hello from 23 out of 32 > Hello from 24 out of 32 > Hello from 25 out of 32 > Hello from 26 out of 32 > Hello from 27 out of 32 > Hello from 28 out of 32 > Hello from 29 out of 32 > Hello from 2 out of 32 > Hello from 30 out of 32 > Hello from 31 out of 32 > Hello from 3 out of 32 > Hello from 4 out of 32 > Hello from 5 out of 32 > Hello from 6 out of 32 > Hello from 7 out of 32 > Hello from 8 out of 32 > Hello from 9 out of 32 > balay@es^/sandbox/balay/omp > > Now I build petsc with: > > ./configure --with-cc=icc --with-mpi=0 --with-openmp --with-fc=0 > --with-cxx=0 PETSC_ARCH=arch-omp > > i.e > balay@es^/sandbox/balay/omp $ ldd /sandbox/balay/petsc/arch-omp/ > lib/libpetsc.so > linux-vdso.so.1 => (0x00007fff8bfb2000) > liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f513fbbf000) > libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f513e3b6000) > libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 > (0x00007f513e081000) > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 > (0x00007f513de63000) > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 > (0x00007f513dc5f000) > libimf.so => /soft/com/packages/intel/16/u3/lib/intel64/libimf.so > (0x00007f513d761000) > libsvml.so => /soft/com/packages/intel/16/u3/lib/intel64/libsvml.so > (0x00007f513c855000) > libirng.so => /soft/com/packages/intel/16/u3/lib/intel64/libirng.so > (0x00007f513c4e3000) > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f513c1dd000) > libiomp5.so => /soft/com/packages/intel/16/u3/lib/intel64/libiomp5.so > (0x00007f513be99000) > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 > (0x00007f513bc83000) > libintlc.so.5 => > /soft/com/packages/intel/16/u3/lib/intel64/libintlc.so.5 > (0x00007f513ba17000) > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f513b64e000) > libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 > (0x00007f513b334000) > libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 > (0x00007f513b115000) > /lib64/ld-linux-x86-64.so.2 (0x00007f5142b40000) > libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 > (0x00007f513aed9000) > libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 > (0x00007f513acd5000) > libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 > (0x00007f513aacf000) > > > And - then link in petsc with your test - and that works fine for me. > > balay@es^/sandbox/balay/omp $ rm -f *.o *.mod > balay@es^/sandbox/balay/omp $ ifort -qopenmp -c hellocount.F90 > balay@es^/sandbox/balay/omp $ ifort -qopenmp hellocount_main.F90 > hellocount.o -Wl,-rpath,/sandbox/balay/petsc/arch-omp/lib > -L/sandbox/balay/petsc/arch-omp/lib -lpetsc -liomp5 > balay@es^/sandbox/balay/omp $ ./a.out |sort -n > Hello from 0 out of 32 > Hello from 10 out of 32 > Hello from 11 out of 32 > Hello from 12 out of 32 > Hello from 13 out of 32 > Hello from 14 out of 32 > Hello from 15 out of 32 > Hello from 16 out of 32 > Hello from 17 out of 32 > Hello from 18 out of 32 > Hello from 19 out of 32 > Hello from 1 out of 32 > Hello from 20 out of 32 > Hello from 21 out of 32 > Hello from 22 out of 32 > Hello from 23 out of 32 > Hello from 24 out of 32 > Hello from 25 out of 32 > Hello from 26 out of 32 > Hello from 27 out of 32 > Hello from 28 out of 32 > Hello from 29 out of 32 > Hello from 2 out of 32 > Hello from 30 out of 32 > Hello from 31 out of 32 > Hello from 3 out of 32 > Hello from 4 out of 32 > Hello from 5 out of 32 > Hello from 6 out of 32 > Hello from 7 out of 32 > Hello from 8 out of 32 > Hello from 9 out of 32 > > Satish > > > On Fri, 2 Mar 2018, Adrián Amor wrote: > > > Thanks Satish, I tried the procedure you suggested and I get the same > > performance, so I guess that MKL is not a problem in this case (I agree > > with you that it has to be improved though... my makefile is a little > > chaotic with all the libraries that I use). > > > > And thanks Barry and Matthew! I'll try to ask to the Intel compiler forum > > since I also think that this is a problem related to the compiler and if > I > > make some advance I'll let you know! In the end, I guess I'll drop > > acceleration through OpenMP threads... > > > > Thanks all! > > > > Adrian. > > > > 2018-03-02 17:11 GMT+01:00 Satish Balay <ba...@mcs.anl.gov>: > > > > > When using MKL - PETSc attempts to default to sequential MKL. > > > > > > Perhaps this pulls in a *conflicting* dependency against -liomp5 - and > > > one has to use threaded MKL for this case. i.e not use > > > -lmkl_sequential > > > > > > You appear to have multiple mkl libraires linked in - its not clear > > > what they are for - and if there are any conflicts there. > > > > > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/ > mkl/lib/intel64 > > > > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lpetsc > -lmkl_intel_lp64 > > > > -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread > -lm > > > > > > > -lmkl_intel_lp64 lmkl_sequential -lmkl_core -lpthread > > > > > > To test this out - suggest rebuilding PETSc with > > > --download-fblaslapack [and no mkl or related pacakges] - and then run > > > this test case you have [with openmp] > > > > > > And then add back one mkl package at a time.. > > > > > > Satish > > > > > > > > > On Fri, 2 Mar 2018, Adrián Amor wrote: > > > > > > > Hi all, > > > > > > > > I have been working in the last months with PETSC in a FEM program > > > written > > > > on FORTRAN, so far sequential. Now, I want to parallelize it with > OpenMP > > > > and I have found some problems. Finally, I have built a mockup > program > > > > trying to localize the error. > > > > > > > > 1. I have compiled PETSC with these options: > > > > ./configure --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > > > > --with-blas-lapack-dir=/opt/intel/mkl/lib/intel64/ > --with-debugging=1 > > > > --with-scalar-type=complex --with-threadcomm --with-pthreadclasses > > > > --with-openmp > > > > --with-openmp-include=/opt/intel/compilers_and_libraries_ > > > 2016.1.150/linux/compiler/lib/intel64_lin > > > > --with-openmp-lib=/opt/intel/compilers_and_libraries_2016. > > > 1.150/linux/compiler/lib/intel64_lin/libiomp5.a > > > > PETSC_ARCH=linux-intel-dbg PETSC-AVOID-MPIF-H=1 > > > > > > > > (I have tried also removing --with-threadcomm > --with-pthreadclasses and > > > > with libiomp5.so). > > > > > > > > 2. The program to be executed is composed of two files, one is > > > > hellocount.F90: > > > > MODULE hello_count > > > > use omp_lib > > > > IMPLICIT none > > > > > > > > CONTAINS > > > > subroutine hello_print () > > > > integer :: nthreads,mythread > > > > > > > > !pragma hello-who-omp-f > > > > !$omp parallel > > > > nthreads = omp_get_num_threads() > > > > mythread = omp_get_thread_num() > > > > write(*,'("Hello from",i3," out of",i3)') mythread,nthreads > > > > !$omp end parallel > > > > !pragma end > > > > end subroutine hello_print > > > > END MODULE hello_count > > > > > > > > and the other one is hellocount_main.F90: > > > > Program Hello > > > > > > > > USE hello_count > > > > > > > > call hello_print > > > > > > > > STOP > > > > > > > > end Program Hello > > > > > > > > 3. To compile these two functions I use: > > > > rm -rf _obj > > > > mkdir _obj > > > > > > > > ifort -E -I/home/aamor/petsc/include > > > > -I/home/aamor/petsc/linux-intel-dbg/include -c hellocount.F90 > > > > >_obj/hellocount.f90 > > > > ifort -E -I/home/aamor/petsc/include > > > > -I/home/aamor/petsc/linux-intel-dbg/include -c hellocount_main.F90 > > > > >_obj/hellocount_main.f90 > > > > > > > > mpiifort -CB -g -warn all -O0 -shared-intel -check:none -qopenmp > -module > > > > _obj -I./_obj -I/home/aamor/MUMPS_5.1.2/include > > > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/include > > > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > > mkl/include/intel64/lp64/ > > > > -I/home/aamor/petsc/include -I/home/aamor/petsc/linux- > intel-dbg/include > > > -o > > > > _obj/hellocount.o -c _obj/hellocount.f90 > > > > mpiifort -CB -g -warn all -O0 -shared-intel -check:none -qopenmp > -module > > > > _obj -I./_obj -I/home/aamor/MUMPS_5.1.2/include > > > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/include > > > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > > mkl/include/intel64/lp64/ > > > > -I/home/aamor/petsc/include -I/home/aamor/petsc/linux- > intel-dbg/include > > > -o > > > > _obj/hellocount_main.o -c _obj/hellocount_main.f90 > > > > > > > > mpiifort -CB -g -warn all -O0 -shared-intel -check:none -qopenmp > -module > > > > _obj -I./_obj -o exec/HELLO _obj/hellocount.o _obj/hellocount_main.o > > > > /home/aamor/lib_tmp/libarpack_LinuxIntel15.a > > > > /home/aamor/MUMPS_5.1.2/lib/libzmumps.a > > > > /home/aamor/MUMPS_5.1.2/lib/libmumps_common.a > > > > /home/aamor/MUMPS_5.1.2/lib/libpord.a > > > > /home/aamor/parmetis-4.0.3/lib/libparmetis.a > > > > /home/aamor/parmetis-4.0.3/lib/libmetis.a > > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/ > mkl/lib/intel64 > > > > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lpetsc > -lmkl_intel_lp64 > > > > -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread > -lm > > > > -L/home/aamor/lib_tmp -lgidpost -lz /home/aamor/lua-5.3.3/src/ > liblua.a > > > > /home/aamor/ESEAS-master/libeseas.a > > > > -Wl,-rpath,/home/aamor/petsc/linux-intel-dbg/lib > > > > -L/home/aamor/petsc/linux-intel-dbg/lib > > > > -Wl,-rpath,/opt/intel/mkl/lib/intel64 -L/opt/intel/mkl/lib/intel64 > > > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib/debug_mt > > > -L/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib > > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/ > linux/mkl/lib/intel64 > > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016. > > > 1.150/linux/compiler/lib/intel64_lin > > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > > compiler/lib/intel64_lin > > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/debug_mt > > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lmkl_intel_lp64 > > > > -lmkl_sequential -lmkl_core -lpthread -lX11 -lssl -lcrypto -lifport > > > > -lifcore_pic -lmpicxx -ldl -Wl,-rpath,/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib/debug_mt -L/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib > -lmpifort > > > > -lmpi -lmpigi -lrt -lpthread -Wl,-rpath,/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib/debug_mt -L/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib > > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/ > linux/mkl/lib/intel64 > > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016. > > > 1.150/linux/compiler/lib/intel64_lin > > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > > compiler/lib/intel64_lin > > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/ > linux/mkl/lib/intel64 > > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib/debug_mt > > > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib > > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/debug_mt > > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -limf -lsvml -lirng -lm > > > -lipgo > > > > -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s > > > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib/debug_mt > > > -L/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/ > > > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib > > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/ > linux/mkl/lib/intel64 > > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016. > > > 1.150/linux/compiler/lib/intel64_lin > > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > > compiler/lib/intel64_lin > > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/ > linux/mkl/lib/intel64 > > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 -ldl > > > > > > > > exec/HELLO > > > > > > > > 4. Then I have seen that: > > > > 4.1. If I set OMP_NUM_THREADS=2 and I remove -lpetsc and -lifcore_pic > > > from > > > > the last step, I got: > > > > Hello from 0 out of 2 > > > > Hello from 1 out of 2 > > > > 4.2 But if add -lpetsc and -lifcore_pic (because I want to use > PETSC) I > > > get > > > > this error: > > > > Hello from 0 out of 2 > > > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown > > > > Image PC Routine Line > > > Source > > > > HELLO 000000000041665C Unknown Unknown > > > Unknown > > > > HELLO 00000000004083C8 Unknown Unknown > > > Unknown > > > > libiomp5.so 00007F9C603566A3 Unknown Unknown > > > Unknown > > > > libiomp5.so 00007F9C60325007 Unknown Unknown > > > Unknown > > > > libiomp5.so 00007F9C603246F5 Unknown Unknown > > > Unknown > > > > libiomp5.so 00007F9C603569C3 Unknown Unknown > > > Unknown > > > > libpthread.so.0 0000003CE76079D1 Unknown Unknown > > > Unknown > > > > libc.so.6 0000003CE6AE88FD Unknown Unknown > > > Unknown > > > > If you set OMP_NUM_THREADS to 8, I get: > > > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown > > > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown > > > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown > > > > > > > > I am sorry if this is a trivial problem because I guess that lots of > > > people > > > > use PETSC with OpenMP in FORTRAN, but I have really done my best to > > > figure > > > > out where the error is. Can you help me? > > > > > > > > Thanks a lot! > > > > > > > > Adrian. > > > > > > > > > >