Re: [deal.II] New deal.II 8.5.1 is 20% slower than deal.II 8.0.0

2017-12-30 Thread Praveen C


> On 30-Dec-2017, at 11:40 PM, luca.heltai  wrote:
> 
> I’m thinking of a matrix of size (n_quadrature_points x n_active_cells) x 
> n_dofs, and then  you slice the results cellwise instead of repeatedly 
> calling get_function_values.
> 
> once:
> 
> M[q+active_cell_index*n_dofs_per_cell, i] = fe_values.shape_value(i,q);

This looks like a Vandermonde matrix which would be identical on every cell. In 
that case, it is not necessary to store it for every cell.

Thanks
praveen

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [deal.II] New deal.II 8.5.1 is 20% slower than deal.II 8.0.0

2017-12-30 Thread luca.heltai
If all you are solving is a two dimensional problem, you could encode your 
“get_function_values” into a matrix vector multiplication to drastically 
improve the situation. 

I’m thinking of a matrix of size (n_quadrature_points x n_active_cells) x 
n_dofs, and then  you slice the results cellwise instead of repeatedly calling 
get_function_values.

once:

M[q+active_cell_index*n_dofs_per_cell, i] = fe_values.shape_value(i,q);

at every solution step, before you actually need the values:
M.vmult(values, solution);

in every cell:
local_values = ArrayView(values[active_cell_index*n_dofs_per_cell], 
n_dofs_per_cell)

L.


> On 27 Dec 2017, at 14:16, drgulev...@gmail.com wrote:
> 
> Thank you! 
> 
> Some guidance how I could optimize the code would be appreciated. I am using 
> deal.II for solving a time-dependent nonlinear 2D problem (sort of 
> sine-Gordon, but a more advanced model which includes a history dependence, 
> https://github.com/drgulevich/mitmojco). Most of the time the deal.II code 
> spends in:
> 
> 1. fe_values.get_function_values -- most of the wall time (70%)
> 2. fe_values.reinit -- less often
> 3. CG solver -- even less often
> 
> Kind regards,
> Dmitry
> 
> On Wednesday, December 27, 2017 at 11:49:42 AM UTC+3, Martin Kronbichler 
> wrote:
> In general, we strive to make deal.II faster with new releases, and for many 
> cases that is also true as I can confirm from my applications. I have ran 
> step-23 on release 8.0 as well as the current development sources and I can 
> confirm that the new version is slower on my machine. If I disable output of 
> step-23, I get a run time of 4.7 seconds for version 8.0 and 5.3 seconds for 
> the current version. After some investigations I found out that while some 
> solver-related operations got faster indeed (the problem with   16k dofs 
> is small enough to run from L3 cache in my case), we are slower in the 
> FEValues::reinit() calls. This call appears in 
> VectorTools::create_right_hand_side() and the 
> VectorTools::interpolate_boundary_values in the time loop. The reason for 
> this is that we nowadays call 
> "MappingQGeneric::compute_mapping_support_points" also for the   bilinear 
> mapping MappingQ1, which allocates and de-allocates a vector. While this is 
> uncritical on higher order mappings, in 2D with linear shape functions the 
> time spent there is indeed not negligible. This is indeed unfortunate for 
> your use case, but I want to stress that the changes were made in the hope to 
> make that part of the code more reliable. Furthermore, those parts of the 
> code are not performance critical and not accurately tracked. It is a rather 
> isolated issue that got worse here, so from this   single example one 
> definitely not say that we are going the wrong direction as a project.
> While there are plenty of things I could imagine to make this particular case 
> more efficient in the application code, way beyond the performance of what 
> the version 8.0 provided - note that I would not write the code like that if 
> it were performance critical - the only obvious thing is that we could try to 
> work around the memory allocations by not returning a vector in 
> MappingQGeneric::compute_mapping_support_points but rather fill an existing 
> array in MappingQGeneric::InternalData::mapping_support_points. Nobody of us 
> developers has this high on the priority list right now, but we would 
> definitely appreciate if some of our users, like you, wants to look into 
> that. I could guide you to the right spots.
> 
> Best regards,
> Martin
> 
> On 26.12.2017 21:22, drgul...@gmail.com wrote:
>> Yes, the two are attached. The key lines from their diff result:
>> 
>> $ diff detailed.log-v8.1.0 detailed.log-v8.5.1
>> ...
>> < #  Compiler flags used for this build:
>> < #CMAKE_CXX_FLAGS:  -pedantic -fpic -Wall 
>> -Wpointer-arith -Wwrite-strings -Wsynth -Wsign-compare -Wswitch 
>> -Wno-unused-local-typedefs -Wno-long-long -Wno-deprecated 
>> -Wno-deprecated-declarations -std=c++11 -Wno-parentheses -Wno-long-long
>> < #DEAL_II_CXX_FLAGS_RELEASE:-O2 -funroll-loops 
>> -funroll-all-loops -fstrict-aliasing -Wno-unused
>> ---
>> > #  Base configuration (prior to feature configuration):
>> > #DEAL_II_CXX_FLAGS:-pedantic -fPIC -Wall -Wextra 
>> > -Wpointer-arith -Wwrite-strings -Wsynth -Wsign-compare -Wswitch 
>> > -Woverloaded-virtual -Wno-long-long -Wno-deprecated-declarations 
>> > -Wno-literal-suffix -std=c++11
>> > #DEAL_II_CXX_FLAGS_RELEASE:-O2 -funroll-loops 
>> > -funroll-all-loops -fstrict-aliasing -Wno-unused-local-typedefs
>> 18c19
>> < #DEAL_II_LINKER_FLAGS: -Wl,--as-needed -rdynamic -pthread
>> ---
>> > #DEAL_II_LINKER_FLAGS: -Wl,--as-needed -rdynamic 
>> > -fuse-ld=gold
>> ...
>> > #BOOST_CXX_FLAGS = -Wno-unused-local-typedefs
>> ...
>> > #  ( DEAL_II_WITH_BZIP2 = OFF )
>> > #DEAL_II_WITH_CXX11 = ON
>> >