If all you are solving is a two dimensional problem, you could encode your
“get_function_values” into a matrix vector multiplication to drastically
improve the situation.
I’m thinking of a matrix of size (n_quadrature_points x n_active_cells) x
n_dofs, and then you slice the results cellwise instead of repeatedly calling
get_function_values.
once:
M[q+active_cell_index*n_dofs_per_cell, i] = fe_values.shape_value(i,q);
at every solution step, before you actually need the values:
M.vmult(values, solution);
in every cell:
local_values = ArrayView(values[active_cell_index*n_dofs_per_cell],
n_dofs_per_cell)
L.
> On 27 Dec 2017, at 14:16, drgulev...@gmail.com wrote:
>
> Thank you!
>
> Some guidance how I could optimize the code would be appreciated. I am using
> deal.II for solving a time-dependent nonlinear 2D problem (sort of
> sine-Gordon, but a more advanced model which includes a history dependence,
> https://github.com/drgulevich/mitmojco). Most of the time the deal.II code
> spends in:
>
> 1. fe_values.get_function_values -- most of the wall time (70%)
> 2. fe_values.reinit -- less often
> 3. CG solver -- even less often
>
> Kind regards,
> Dmitry
>
> On Wednesday, December 27, 2017 at 11:49:42 AM UTC+3, Martin Kronbichler
> wrote:
> In general, we strive to make deal.II faster with new releases, and for many
> cases that is also true as I can confirm from my applications. I have ran
> step-23 on release 8.0 as well as the current development sources and I can
> confirm that the new version is slower on my machine. If I disable output of
> step-23, I get a run time of 4.7 seconds for version 8.0 and 5.3 seconds for
> the current version. After some investigations I found out that while some
> solver-related operations got faster indeed (the problem with 16k dofs
> is small enough to run from L3 cache in my case), we are slower in the
> FEValues::reinit() calls. This call appears in
> VectorTools::create_right_hand_side() and the
> VectorTools::interpolate_boundary_values in the time loop. The reason for
> this is that we nowadays call
> "MappingQGeneric::compute_mapping_support_points" also for the bilinear
> mapping MappingQ1, which allocates and de-allocates a vector. While this is
> uncritical on higher order mappings, in 2D with linear shape functions the
> time spent there is indeed not negligible. This is indeed unfortunate for
> your use case, but I want to stress that the changes were made in the hope to
> make that part of the code more reliable. Furthermore, those parts of the
> code are not performance critical and not accurately tracked. It is a rather
> isolated issue that got worse here, so from this single example one
> definitely not say that we are going the wrong direction as a project.
> While there are plenty of things I could imagine to make this particular case
> more efficient in the application code, way beyond the performance of what
> the version 8.0 provided - note that I would not write the code like that if
> it were performance critical - the only obvious thing is that we could try to
> work around the memory allocations by not returning a vector in
> MappingQGeneric::compute_mapping_support_points but rather fill an existing
> array in MappingQGeneric::InternalData::mapping_support_points. Nobody of us
> developers has this high on the priority list right now, but we would
> definitely appreciate if some of our users, like you, wants to look into
> that. I could guide you to the right spots.
>
> Best regards,
> Martin
>
> On 26.12.2017 21:22, drgul...@gmail.com wrote:
>> Yes, the two are attached. The key lines from their diff result:
>>
>> $ diff detailed.log-v8.1.0 detailed.log-v8.5.1
>> ...
>> < # Compiler flags used for this build:
>> < #CMAKE_CXX_FLAGS: -pedantic -fpic -Wall
>> -Wpointer-arith -Wwrite-strings -Wsynth -Wsign-compare -Wswitch
>> -Wno-unused-local-typedefs -Wno-long-long -Wno-deprecated
>> -Wno-deprecated-declarations -std=c++11 -Wno-parentheses -Wno-long-long
>> < #DEAL_II_CXX_FLAGS_RELEASE:-O2 -funroll-loops
>> -funroll-all-loops -fstrict-aliasing -Wno-unused
>> ---
>> > # Base configuration (prior to feature configuration):
>> > #DEAL_II_CXX_FLAGS:-pedantic -fPIC -Wall -Wextra
>> > -Wpointer-arith -Wwrite-strings -Wsynth -Wsign-compare -Wswitch
>> > -Woverloaded-virtual -Wno-long-long -Wno-deprecated-declarations
>> > -Wno-literal-suffix -std=c++11
>> > #DEAL_II_CXX_FLAGS_RELEASE:-O2 -funroll-loops
>> > -funroll-all-loops -fstrict-aliasing -Wno-unused-local-typedefs
>> 18c19
>> < #DEAL_II_LINKER_FLAGS: -Wl,--as-needed -rdynamic -pthread
>> ---
>> > #DEAL_II_LINKER_FLAGS: -Wl,--as-needed -rdynamic
>> > -fuse-ld=gold
>> ...
>> > #BOOST_CXX_FLAGS = -Wno-unused-local-typedefs
>> ...
>> > # ( DEAL_II_WITH_BZIP2 = OFF )
>> > #DEAL_II_WITH_CXX11 = ON
>> >