Dear deal.II community, I have implemented a multiscale finite element problem using deal.II. It includes PDEs on different levels, namely a macroscopic/effective PDE and so-called "cell problems" (not to be confused with cells of the grid), which have to be solved in each quadrature point of the macroscopic grid in order to compute the effective coefficient of the macroscopic PDE.
I am trying to parallelize parts of the code using the Workstream class, in particular the solution of the cell problems. Unfortunately, I observe suboptimal parallel efficiency when I compare the runtimes for different numbers of threads: as an example, the efficiency is only approx. 75% when I change from one thread to two threads with MultiThreadInfo::set_thread_limit(). I should mention that I only compare the runtimes for the parallelized part of the program, i.e. I expect an efficiency close to 100%. Now I was able to trace back the efficiency problem to the following issue: when assembling the system matrix for a cell problem (which has to be done for each quadrature point of the macroscopic grid), I measure the following wall times for one call to scratch.fe_values.reinit(cell): - one thread: ~1.5e-6s, - two threads: ~5e-6s. If the time spent for the task (i.e. the local assembly) on a cell is comparably small, the parallel efficiency suffers. However, the problem vanishes (regain of ~100% efficiency) if the time spent on one cell during assembly is increased, e.g. by using higher order finite elements or by simply putting the thread to sleep for a sufficiently long time. This is not really an option in my application. I have also attached a MWE. A similar behavior can be observed in tutorial step 9 when the FE order in line 259 is lowered from 5 to 1. My questions are: - What exactly is the reason for this behavior of the FEValues::reinit-function, or how can I find it out by myself? - Is there anything I can do to circumvent this problem? Thank you in advance, any help/hint is highly appreciated. Best, Jonas -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/47fa95e5-bcb3-479c-a253-2f0129ef1afdn%40googlegroups.com.
#include <deal.II/grid/tria.h> #include <deal.II/dofs/dof_handler.h> #include <deal.II/grid/grid_generator.h> #include <deal.II/fe/fe_q.h> #include <deal.II/dofs/dof_tools.h> #include <deal.II/fe/fe_values.h> #include <deal.II/base/quadrature_lib.h> #include <deal.II/base/function.h> #include <deal.II/numerics/vector_tools.h> #include <deal.II/numerics/matrix_tools.h> #include <deal.II/lac/vector.h> #include <deal.II/lac/full_matrix.h> #include <deal.II/lac/sparse_matrix.h> #include <deal.II/lac/dynamic_sparsity_pattern.h> #include <deal.II/numerics/data_out.h> #include <deal.II/base/work_stream.h> #include <deal.II/base/multithread_info.h> #include <deal.II/base/timer.h> #include <fstream> #include <iostream> #include <thread> #include <chrono> #include <deal.II/base/logstream.h> using namespace dealii; using namespace std::chrono_literals; template <int dim> class Efficiency_Issue { public: Efficiency_Issue(); void run(); private: void make_grid(); void setup_system(); Triangulation<dim> triangulation; FE_Q<dim> fe; DoFHandler<dim> dof_handler; SparsityPattern sparsity_pattern; SparseMatrix<double> system_matrix; Vector<double> system_rhs; struct PerTaskData{ FullMatrix<double> cell_matrix; Vector<double> cell_rhs; std::vector<unsigned int> dof_indices; }; struct ScratchData { ScratchData(const FiniteElement<dim> &fe); ScratchData(const ScratchData &scratch); FEValues<dim> fe_values; }; void assemble_on_one_cell(const typename DoFHandler<dim>::active_cell_iterator &cell, ScratchData &scratch, PerTaskData &data); void copy_local_to_global(const PerTaskData &data); void assemble_system_in_parallel(); }; template <int dim> Efficiency_Issue<dim>::ScratchData::ScratchData(const FiniteElement<dim> &fe) : fe_values(fe, QGauss<dim>(fe.degree+1), update_values | update_gradients | update_quadrature_points | update_JxW_values) {} template <int dim> Efficiency_Issue<dim>::ScratchData::ScratchData(const ScratchData &scratch) : fe_values(scratch.fe_values.get_fe(), scratch.fe_values.get_quadrature(), update_values | update_gradients | update_quadrature_points | update_JxW_values) {} template <int dim> Efficiency_Issue<dim>::Efficiency_Issue() : fe(1) , dof_handler(triangulation) {} template <int dim> void Efficiency_Issue<dim>::make_grid() { GridGenerator::hyper_cube(triangulation, -1, 1); triangulation.refine_global(8); std::cout << " Number of active cells: " << triangulation.n_active_cells() << std::endl; } template <int dim> void Efficiency_Issue<dim>::setup_system() { dof_handler.distribute_dofs(fe); } template <int dim> void Efficiency_Issue<dim>::assemble_on_one_cell(const typename DoFHandler<dim>::active_cell_iterator &cell, ScratchData &scratch, PerTaskData &data) { scratch.fe_values.reinit(cell); std::this_thread::sleep_for(1e-6s); /* * efficiency of 23% with sleep_for(0s) * efficiency of 56% with sleep_for(1e-6s) * efficiency of 79% with sleep_for(1e-5s) * efficiency of 97% with sleep_for(1e-4s) */ } template <int dim> void Efficiency_Issue<dim>::copy_local_to_global(const PerTaskData &data) { // } template <int dim> void Efficiency_Issue<dim>::assemble_system_in_parallel() { int N_threads = 1; MultithreadInfo::set_thread_limit(N_threads); Timer timer; timer.start(); WorkStream::run(dof_handler.begin_active(), dof_handler.end(), *this, &Efficiency_Issue::assemble_on_one_cell, &Efficiency_Issue::copy_local_to_global, ScratchData(fe), PerTaskData()); std::cout << "Elapsed wall time with " << N_threads << " thread(s): " << timer.wall_time() << std::endl; } template <int dim> void Efficiency_Issue<dim>::run() { make_grid(); setup_system(); assemble_system_in_parallel(); } int main() { Efficiency_Issue<2> problem; problem.run(); return 0; }