On Thu, Feb 18, 2021 at 7:15 PM Barry Smith <[email protected]> wrote:
> > > On Feb 18, 2021, at 6:10 AM, Matthew Knepley <[email protected]> wrote: > > On Thu, Feb 18, 2021 at 3:09 AM Roland Richter <[email protected]> > wrote: > >> Hei, >> >> that was the reason for increased run times. When removing #pragma omp >> parallel for, my loop took ~18 seconds. When changing it to #pragma omp >> parallel for num_threads(2) or #pragma omp parallel for num_threads(4) (on >> a i7-6700), the loop took ~16 s, but when increasing it to #pragma omp >> parallel for num_threads(8), the loop took 28 s. >> >> Editorial: This is a reason I think OpenMP is inappropriate as a tool > for parallel computing (many people disagree). It makes resource management > difficult for the user and impossible for a library. > > > It is possible to control these things properly with modern OpenMP APIs > but, like MPI implementations, this can require some mucking around a > beginner would not know about and the default settings can be terrible. MPI > implementations are not better, their default bindings are generally > horrible. > MPI allows the library to understand what resources are available and used. Last time we looked at it, OpenMP does not have such a context object that gets passed into the library (comm). The user could construct one, but then the "usability" of OpenMP fades away. Matt > Barry > > > Thanks, > > Matt > >> Regards, >> >> Roland >> Am 17.02.21 um 18:51 schrieb Matthew Knepley: >> >> Jed, is it possible that this is an oversubscription penalty from bad >> OpenMP settings? <said by a person who knows less about OpenMP than >> cuneiform> >> >> Thanks, >> >> Matt >> >> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter <[email protected]> >> wrote: >> >>> My PetscScalar is complex double (i.e. even higher penalty), but my >>> matrix has a size of 8kk elements, so that should not an issue. >>> Regards, >>> Roland >>> ------------------------------ >>> *Von:* Jed Brown <[email protected]> >>> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49 >>> *An:* Roland Richter; PETSc >>> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in >>> performance drop and wrong results >>> >>> Roland Richter <[email protected]> writes: >>> >>> > Hei, >>> > >>> > I replaced the linking line with >>> > >>> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -march=native -fopenmp-simd >>> > -DMKL_LP64 -m64 >>> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o >>> > bin/armadillo_with_PETSc >>> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib >>> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran >>> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 >>> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl >>> > /opt/boost/lib/libboost_filesystem.so.1.72.0 >>> > /opt/boost/lib/libboost_mpi.so.1.72.0 >>> > /opt/boost/lib/libboost_program_options.so.1.72.0 >>> > /opt/boost/lib/libboost_serialization.so.1.72.0 >>> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so >>> > /opt/petsc_release/lib/libpetsc.so >>> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so >>> > / >>> > >>> > and now the results are correct. Nevertheless, when comparing the loop >>> > in line 26-28 in file test_scaling.cpp >>> > >>> > /#pragma omp parallel for// >>> > // for(int i = 0; i < r_0 * r_1; ++i)// >>> > // *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ >>> > >>> > the version without /#pragma omp parallel/ for is significantly faster >>> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there >>> > still such a big difference? >>> >>> Sounds like you're using a profile to attribute time? Each `omp >>> parallel` region incurs a cost ranging from about a microsecond to 10 or >>> more microseconds depending on architecture, number of threads, and OpenMP >>> implementation. Your loop (for double precision) operates at around 8 >>> entries per clock cycle (depending on architecture) if the operands are in >>> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off >>> the cost of `omp parallel`. >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
