Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Wolfgang Bangerth

On 10/19/22 08:45, Simon Wiesheier wrote:


What I want to do boils down to the following:
Given the reference co-ordinates of a point 'p', along with the cell on 
which 'p' lives,
give me the value and gradient of a finite element function evaluated at 
'p'.


My idea was to create a quadrature object with 'p' being the only 
quadrature point and pass this
quadrature object to the FEValues object and finally do the 
.reinit(cell) call (then, of course, get_function_values()...)
'p' is different for all (2.5 million) quadrature points, which is why I 
create the FEValues object so many times.


It's worth pointing out that is exactly what VectorTools::point_values() 
does.


(As others have already mentioned, if you want to do that many many 
times over, this is too expensive and you should be using 
FEPointEvaluation instead.)


Best
 W.

--

Wolfgang Bangerth  email: bange...@colostate.edu
   www: http://www.math.colostate.edu/~bangerth/

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/cd1c8fa0-443d-b7bf-b433-f5ab033a247c%40colostate.edu.


Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Martin Kronbichler
Dear Simon,

You seem to be looking for FEPointEvaluation. That class is shown in
step-19 and provides, for simple FiniteElement types, a much faster way to
evaluate solutions at arbitrary points within a cell. Do you want to give
it a try? The issue you are facing is that FEValues that you are using is
using a very abstract entry point that does precomputations that only pay
off if using the unit points many times. And even in the case of the same
unit points it is not really fast, it is a general-purpose baseline that I
would not recommend for high-performance purposes.

As a final note, I would mention that FEPointEvaluation falls back to
FEValues for complicated FiniteElement types, so it might be that you do
not get speedups in those cases. But we could work on it if you need it,
today we know much better what to do than a few years ago.

Best,
Martin

On Wed, 19 Oct 2022, 16:45 Simon Wiesheier, 
wrote:

> " It's an environment variable. "
>
> I did
> $DEAL_II_NUM_THREADS
> and the variable is not set.
> But if it were set to one, why would this explain the gap between cpu and
> wall time?
>
> " My point is the constructor should not be called millions of times. You
> are not going to be able to get that function 100 times faster. It's best
> to find a way to call it less often. "
>
> What I want to do boils down to the following:
> Given the reference co-ordinates of a point 'p', along with the cell on
> which 'p' lives,
> give me the value and gradient of a finite element function evaluated at
> 'p'.
>
> My idea was to create a quadrature object with 'p' being the only
> quadrature point and pass this
> quadrature object to the FEValues object and finally do the .reinit(cell)
> call (then, of course, get_function_values()...)
> 'p' is different for all (2.5 million) quadrature points, which is why I
> create the FEValues object so many times.
>
> Do you a different suggestion to solve my problem, ie to evaluate the
> finite element field and its derivatives at 'p'?
>
> Best,
> Simon
>
>
> Am Mi., 19. Okt. 2022 um 16:17 Uhr schrieb Bruno Turcksin <
> bruno.turck...@gmail.com>:
>
>> Simon,
>>
>> Le mer. 19 oct. 2022 à 09:33, Simon Wiesheier 
>> a écrit :
>>
>>> Thank you for your answer!
>>>
>>> " Did you set DEAL_II_NUM_THREADS=1?"
>>>
>>> How can I double-check that?
>>> ccmake .
>>> only shows my the variables CMAKE_BUILD_TYPE and deal.II_DIR .
>>> But I do  do knot if this is the right place to look for.
>>>
>> It's an environment variable. If you are using bash, you can do
>>
>> export DEAL_II_NUM_THREADS=1
>>
>>
>>>
>>> " That could explain why CPU and Wall time are different. Finally, if I
>>> understand correctly, you are calling the constructor of FEValues about 2.5
>>> million times. That means that the call to one FEValues constructor is
>>> 100/2.5e6 seconds about 40 microseconds. That doesn't seem too slow. "
>>>
>>> There was a typo in my post. It should be 160/2.5e6 seconds about 64
>>> microsecends.
>>>
>> My point is the constructor should not be called millions of times. You
>> are not going to be able to get that function 100 times faster. It's best
>> to find a way to call it less often.
>>
>> Best,
>>
>> Bruno
>>
>> --
>> The deal.II project is located at http://www.dealii.org/
>> For mailing list/forum options, see
>> https://groups.google.com/d/forum/dealii?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "deal.II User Group" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to dealii+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/dealii/CAGVt9eMfVohOUToQOsBD_v%2BqU%3D0Em_XOMiwqFi2SM_0zLoy-sQ%40mail.gmail.com
>> 
>> .
>>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/CAM50jEtyY576riC6yNqqMafXfGGvTXY8mhm%3Di7HMzr-U_LAxbQ%40mail.gmail.com
> 
> .
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this 

[deal.II] deal.II Newsletter #230

2022-10-19 Thread 'Rene Gassmoeller' via deal.II User Group
Hello everyone!

This is deal.II newsletter #230.
It automatically reports recently merged features and discussions about the 
deal.II finite element library.


## Below you find a list of recently proposed or merged features:

#14362: Execute explicit instantiations of compute_intersection_of_cells() 
(proposed by jh66637) https://github.com/dealii/dealii/pull/14362

#14361: introduce BoundingBox::is_neighbor() (proposed by jh66637) 
https://github.com/dealii/dealii/pull/14361

#14360: Get vertices in CGAL order on face (proposed by jh66637) 
https://github.com/dealii/dealii/pull/14360

#14357: Fix two typos in ConsensusAlgorithm::NBX (proposed by gassmoeller) 
https://github.com/dealii/dealii/pull/14357

#14356: [WIP] Remove communication in CollectiveMutex during stack unwinding 
(proposed by gassmoeller) https://github.com/dealii/dealii/pull/14356

#14355: Use Ubuntu 20.04 in GitHub actions (proposed by masterleinad) 
https://github.com/dealii/dealii/pull/14355

#14354: VectorizedArray: mixed load/store (proposed by peterrum; merged) 
https://github.com/dealii/dealii/pull/14354

#14353: Add tolerance to avoid empty CGAL intersections due to roundoff 
(proposed by jh66637) https://github.com/dealii/dealii/pull/14353

#14352: [WIP] Add TrilinosWrappers::SolverBelos (proposed by peterrum) 
https://github.com/dealii/dealii/pull/14352

#14351: GitHub CI: update Ubuntu version (proposed by peterrum) 
https://github.com/dealii/dealii/pull/14351

#14350: Refactor SolverGMRES::modified_gram_schmidt (proposed by peterrum; 
merged) https://github.com/dealii/dealii/pull/14350

#14349: SolverGMRES: add classical Gram-Schmidt (proposed by peterrum) 
https://github.com/dealii/dealii/pull/14349

#14348: Implement ReferenceCell::n_face_orientations(). (proposed by drwells; 
merged) https://github.com/dealii/dealii/pull/14348

#14339: Fix some doxygen problems. (proposed by bangerth; merged) 
https://github.com/dealii/dealii/pull/14339


## And this is a list of recently opened or closed discussions:

#14359: CGAL intersection bug (opened) 
https://github.com/dealii/dealii/issues/14359

#14358: dealii:when i use Gmsh to generate .msh file, Number of 
coupling nodes: 0 (opened) https://github.com/dealii/dealii/issues/14358


A list of all major changes since the last release can be found at 
https://www.dealii.org/developer/doxygen/deal.II/recent_changes.html.


Thanks for being part of the community!


Let us know about questions, problems, bugs or just share your experience by 
writing to dealii@googlegroups.com, or by opening issues or pull requests at 
https://www.github.com/dealii/dealii.
Additional information can be found at https://www.dealii.org/.

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/63501f12.050a0220.b3a20.a520SMTPIN_ADDED_MISSING%40gmr-mx.google.com.


Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Simon Wiesheier
" It's an environment variable. "

I did
$DEAL_II_NUM_THREADS
and the variable is not set.
But if it were set to one, why would this explain the gap between cpu and
wall time?

" My point is the constructor should not be called millions of times. You
are not going to be able to get that function 100 times faster. It's best
to find a way to call it less often. "

What I want to do boils down to the following:
Given the reference co-ordinates of a point 'p', along with the cell on
which 'p' lives,
give me the value and gradient of a finite element function evaluated at
'p'.

My idea was to create a quadrature object with 'p' being the only
quadrature point and pass this
quadrature object to the FEValues object and finally do the .reinit(cell)
call (then, of course, get_function_values()...)
'p' is different for all (2.5 million) quadrature points, which is why I
create the FEValues object so many times.

Do you a different suggestion to solve my problem, ie to evaluate the
finite element field and its derivatives at 'p'?

Best,
Simon


Am Mi., 19. Okt. 2022 um 16:17 Uhr schrieb Bruno Turcksin <
bruno.turck...@gmail.com>:

> Simon,
>
> Le mer. 19 oct. 2022 à 09:33, Simon Wiesheier 
> a écrit :
>
>> Thank you for your answer!
>>
>> " Did you set DEAL_II_NUM_THREADS=1?"
>>
>> How can I double-check that?
>> ccmake .
>> only shows my the variables CMAKE_BUILD_TYPE and deal.II_DIR .
>> But I do  do knot if this is the right place to look for.
>>
> It's an environment variable. If you are using bash, you can do
>
> export DEAL_II_NUM_THREADS=1
>
>
>>
>> " That could explain why CPU and Wall time are different. Finally, if I
>> understand correctly, you are calling the constructor of FEValues about 2.5
>> million times. That means that the call to one FEValues constructor is
>> 100/2.5e6 seconds about 40 microseconds. That doesn't seem too slow. "
>>
>> There was a typo in my post. It should be 160/2.5e6 seconds about 64
>> microsecends.
>>
> My point is the constructor should not be called millions of times. You
> are not going to be able to get that function 100 times faster. It's best
> to find a way to call it less often.
>
> Best,
>
> Bruno
>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/CAGVt9eMfVohOUToQOsBD_v%2BqU%3D0Em_XOMiwqFi2SM_0zLoy-sQ%40mail.gmail.com
> 
> .
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAM50jEtyY576riC6yNqqMafXfGGvTXY8mhm%3Di7HMzr-U_LAxbQ%40mail.gmail.com.


Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Bruno Turcksin
Simon,

Le mer. 19 oct. 2022 à 09:33, Simon Wiesheier  a
écrit :

> Thank you for your answer!
>
> " Did you set DEAL_II_NUM_THREADS=1?"
>
> How can I double-check that?
> ccmake .
> only shows my the variables CMAKE_BUILD_TYPE and deal.II_DIR .
> But I do  do knot if this is the right place to look for.
>
It's an environment variable. If you are using bash, you can do

export DEAL_II_NUM_THREADS=1


>
> " That could explain why CPU and Wall time are different. Finally, if I
> understand correctly, you are calling the constructor of FEValues about 2.5
> million times. That means that the call to one FEValues constructor is
> 100/2.5e6 seconds about 40 microseconds. That doesn't seem too slow. "
>
> There was a typo in my post. It should be 160/2.5e6 seconds about 64
> microsecends.
>
My point is the constructor should not be called millions of times. You are
not going to be able to get that function 100 times faster. It's best to
find a way to call it less often.

Best,

Bruno

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAGVt9eMfVohOUToQOsBD_v%2BqU%3D0Em_XOMiwqFi2SM_0zLoy-sQ%40mail.gmail.com.


Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Simon Wiesheier
Thank you for your answer!

" Did you set DEAL_II_NUM_THREADS=1?"

How can I double-check that?
ccmake .
only shows my the variables CMAKE_BUILD_TYPE and deal.II_DIR .
But I do  do knot if this is the right place to look for.

" That could explain why CPU and Wall time are different. Finally, if I
understand correctly, you are calling the constructor of FEValues about 2.5
million times. That means that the call to one FEValues constructor is
100/2.5e6 seconds about 40 microseconds. That doesn't seem too slow. "

There was a typo in my post. It should be 160/2.5e6 seconds about 64
microsecends.

Best,
Simon

Am Mi., 19. Okt. 2022 um 15:08 Uhr schrieb Bruno Turcksin <
bruno.turck...@gmail.com>:

> Simon,
>
> The best way to profile a code is to use a profiler. It can give a lot
> more information than what simple timers can do. You say that your code is
> not parallelized but by default deal.II is multithreaded . Did you set
> DEAL_II_NUM_THREADS=1? That could explain why CPU and Wall time are
> different. Finally, if I understand correctly, you are calling the
> constructor of FEValues about 2.5 million times. That means that the call
> to one FEValues constructor is 100/2.5e6 seconds about 40 microseconds.
> That doesn't seem too slow.
>
> Best,
>
> Bruno
>
> On Wednesday, October 19, 2022 at 7:51:55 AM UTC-4 Simon wrote:
>
>> Dear all,
>>
>> I implemented two different versions to compute a stress for a given
>> strain and want to compare the associated computation times in release mode.
>>
>> version 1: stress = fun1(strain)  cpu time:  4.52  s  wall
>> time:   4.53 s
>> version 2: stress = fun2(strain) cpu time: 32.5s  wall time:
>> 167.5 s
>>
>> fun1 and fun2, respectively, are invoked for all quadrature points
>> (1,286,144 in the above example) defined on the triangulation. My program
>> is not parallelized.
>> In fun2, I call  find_active_cell_around_point
>> 
>> twice for two different points on two different (helper) triangulations and
>> initialize two FEValues objects
>> with the points ' ref_point_vol' and 'ref_point_dev'
>> as returned by find_active_cell_around_point
>> 
>> .
>> FEValues<1> fe_vol(dof_handler_vol.get_fe(),
>> Quadrature<1>(ref_point_vol),
>> update_gradients |
>> update_values);
>> FEValues<1> fe_values_energy_dev(this->dof_handler_dev.get_fe(),
>> Quadrature<1>(ref_point_dev),
>> update_gradients |
>> update_values);
>>
>> I figured out that the initialization of the two FEValues objects is the
>> biggest portion of the above mentioned times.  In particular, if I comment
>> the initialization out, I have
>> cpu time: 6.54 s wall time: 6.55 s .
>>
>> The triangulations associated with dof_handler_vol and dof_handler_dev
>> are both 1d and store only 4 and 16 elements, respectively. That said, I am
>> wondering why the initialization takes so long (roughly 100 seconds wall
>> time in total) and why this causes a gap between the cpu and wall time.
>> Unfortunately, I have to reinitialize them anew whenever fun2 is called,
>> because  the point 'ref_point_vol' (see Quadrature<1>(ref_point_vol)) is
>> different in each call to fun2.
>>
>> Best
>> Simon
>>
>>
>>
>> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "deal.II User Group" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/dealii/uAplhH99yg4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> dealii+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/d955e8e6-78c8-41f7-9f6c-f5339c22b319n%40googlegroups.com
> 
> .
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAM50jEs0T02%2BpdBiP%2BP72vz_eoN2Et%2BJ6X6HPL9xdZGXOtnmdA%40mail.gmail.com.


Re: [deal.II] Re: Run time analysis for step-75 with matrix-free or matrix-based method

2022-10-19 Thread 'yy.wayne' via deal.II User Group
Besides, the Trilinos direct solver applied is Amesos_Lapack(a mistake). 
Changing to Klu therefore save more time.

在2022年10月19日星期三 UTC+8 20:30:53 写道:

> I run both matrix-based and matrix-free mode with release mode, both speed 
> up a lot. The matrix-free CG iteration speeds up 30 times compared to debug 
> mode. Coarse grid solver doesn't get speed up because I'm using Trilinos 
> interface.
> Matrix-based:
> [image: matrix-based_opt.png]
> 20.6 = mg_matrices
>
> Matrix-free:
> [image: matrix-free_opt.png]
>
> Thank you so much Martin.
>
> Best,
> Wayne
>
> 在2022年10月19日星期三 UTC+8 19:52:36 写道:
>
>> Dear Wayne,
>>
>> For performance it certainly matters, because some components of our 
>> codes have more low-level checks in debug mode than others, and because the 
>> compiler optimizations do not have the same effect on all parts of our 
>> code. Make sure to test the release mode and see if it makes more sense. 
>> We'd be happy to help from there.
>>
>> Best,
>> Martin
>> On 19.10.22 13:50, 'yy.wayne' via deal.II User Group wrote:
>>
>> Thanks Martin ! 
>>
>> I never considered about Debug or optimized mode before. Cmake result 
>> says I'm using Debug mode.
>>
>> Some more information: The computaiton is done in deal.ii 9.4.0 oracle 
>> virtualBox, with 1 mpi process in qtcreator, and CPU is intel 10600kf. I 
>> didn't change the CMakeLists and just copy from examples, so I think by 
>> default it's debug mode.
>>
>> Best,
>> Wayne
>>
>> 在2022年10月19日星期三 UTC+8 19:40:01 写道:
>>
>>> Dear Wayne,
>>>
>>> I am a bit surprised by your numbers and find them rather high, at least 
>>> with the chosen problem sizes. I would expect the matrix-free solver to run 
>>> in less than a second for 111,000 unknowns on typical computers, not almost 
>>> 10 seconds. I need to honestly say that I do not have a good explanation at 
>>> this point. I did not write this tutorial program, but I know more or less 
>>> what should happen. Let me ask a basic question first: Did you record the 
>>> timings with release mode? The numbers would make more sense if they are 
>>> based on the debug mode.
>>>
>>> Best,
>>> Martin
>>>
>>>
>>> On 19.10.22 12:08, 'yy.wayne' via deal.II User Group wrote:
>>>
>>> Thanks for your reply Peter, 
>>>
>>> The matrix-free run is basic same as in step-75 except I substitute 
>>> coarse grid solver. For fe_degree=6 without GMG and fe_degree in each level 
>>> decrease by 1 for pMG, the solve_system() function runtime is 24.1s. It's 
>>> decomposed to *MatrixFree MG operators construction*(1.36s), MatrixFree 
>>> MG transfers(2.73s),  KLU coarse grid solver(5.7s), *setting 
>>> smoother_data and compute_inverse_diagonal for level matrices*(3.4s) CG 
>>> iteration(9.8s).
>>>
>>> The two bold texts cost a lot more(133s and 62s, respectively) in 
>>> matrix-based multigrid case. I noticed just as in step-16, the finest level 
>>> matrix is assembled twice(one for system_matrix and one for 
>>> mg_matrices[maxlevel]) so assembling time cost more.
>>>
>>> Best,
>>> Wayne
>>>
>>> 在2022年10月19日星期三 UTC+8 17:10:27 写道:
>>>
 Hi Wayne, 

 your numbers make totally sense. Don't forget that you are running for 
 high order: degree=6! The number of non-zeroes per element-stiffness 
 matrix 
 is ((degree + 1)^dim)^2 and the cost of computing the element stiffness 
 matrix is even ((degree + 1)^dim)^3 if I am not mistaken (3 nested loop: 
 i, 
 j and q). Higher orders are definitely made for matrix-free algorithms!

 Out of curiosity: how large is the setup cost of MG in the case of the 
 matrix-free run? As a comment: don't be surprised that the setup costs are 
 relatively high compared to the solution process: you are probably setting 
 up a new Triangulation-, DoFHander-, MatrixFree-, ... -object per level. 
 In 
 many simulations, you can reuse these objects, since you don't perform AMR 
 every time step. 

 Peter

 On Wednesday, 19 October 2022 at 10:38:34 UTC+2 yy.wayne wrote:

> Hello everyone, 
>
> I modified step-75 a little bit and try to test it's runtime. However 
> the result is kind of inexplainable from my point of view, especially on 
> *disproportionate 
> assemble time and solve time*. Here are some changes:
> 1. a matrix-based version of step75 is contructed to compare with 
> matrix-free one.
> 2. no mesh refinement and no GMG, and fe_degree is constant across all 
> cells within every cycle. Fe_degree adds one after each cycle. I make 
> this 
> setting to compare runtime due to fe_degree.
> 3. a direct solver on coareset grid. I think it won't affect runtime 
> since coarest grid never change
>
> For final cycle it has fe_degree=6 and DoFs=111,361.
> For matrix-based method, overall runtime is 301s where setup 
> system(84s) and solve system(214s) take up most. In step-75 solve system 
> actually did both multigrid matrices 

[deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Bruno Turcksin
Simon,

The best way to profile a code is to use a profiler. It can give a lot more 
information than what simple timers can do. You say that your code is not 
parallelized but by default deal.II is multithreaded . Did you set 
DEAL_II_NUM_THREADS=1? That could explain why CPU and Wall time are 
different. Finally, if I understand correctly, you are calling the 
constructor of FEValues about 2.5 million times. That means that the call 
to one FEValues constructor is 100/2.5e6 seconds about 40 microseconds. 
That doesn't seem too slow.

Best,

Bruno

On Wednesday, October 19, 2022 at 7:51:55 AM UTC-4 Simon wrote:

> Dear all,
>
> I implemented two different versions to compute a stress for a given 
> strain and want to compare the associated computation times in release mode.
>
> version 1: stress = fun1(strain)  cpu time:  4.52  s  wall time:   
> 4.53 s
> version 2: stress = fun2(strain) cpu time: 32.5s  wall time: 
> 167.5 s 
>
> fun1 and fun2, respectively, are invoked for all quadrature points 
> (1,286,144 in the above example) defined on the triangulation. My program 
> is not parallelized.
> In fun2, I call  find_active_cell_around_point 
> 
>  
> twice for two different points on two different (helper) triangulations and 
> initialize two FEValues objects 
> with the points ' ref_point_vol' and 'ref_point_dev' 
> as returned by find_active_cell_around_point 
> 
>  
> .
> FEValues<1> fe_vol(dof_handler_vol.get_fe(), 
> Quadrature<1>(ref_point_vol),
> update_gradients | update_values); 
>
> FEValues<1> fe_values_energy_dev(this->dof_handler_dev.get_fe(), 
> Quadrature<1>(ref_point_dev),
> update_gradients | update_values); 
>   
>
> I figured out that the initialization of the two FEValues objects is the 
> biggest portion of the above mentioned times.  In particular, if I comment 
> the initialization out, I have 
> cpu time: 6.54 s wall time: 6.55 s .
>
> The triangulations associated with dof_handler_vol and dof_handler_dev are 
> both 1d and store only 4 and 16 elements, respectively. That said, I am 
> wondering why the initialization takes so long (roughly 100 seconds wall 
> time in total) and why this causes a gap between the cpu and wall time.
> Unfortunately, I have to reinitialize them anew whenever fun2 is called, 
> because  the point 'ref_point_vol' (see Quadrature<1>(ref_point_vol)) is 
> different in each call to fun2. 
>
> Best
> Simon
>
>
>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/d955e8e6-78c8-41f7-9f6c-f5339c22b319n%40googlegroups.com.


Re: [deal.II] Re: Run time analysis for step-75 with matrix-free or matrix-based method

2022-10-19 Thread Martin Kronbichler

Dear Wayne,

For performance it certainly matters, because some components of our 
codes have more low-level checks in debug mode than others, and because 
the compiler optimizations do not have the same effect on all parts of 
our code. Make sure to test the release mode and see if it makes more 
sense. We'd be happy to help from there.


Best,
Martin

On 19.10.22 13:50, 'yy.wayne' via deal.II User Group wrote:

Thanks Martin !

I never considered about Debug or optimized mode before. Cmake result 
says I'm using Debug mode.


Some more information: The computaiton is done in deal.ii 9.4.0 oracle 
virtualBox, with 1 mpi process in qtcreator, and CPU is intel 10600kf. 
I didn't change the CMakeLists and just copy from examples, so I think 
by default it's debug mode.


Best,
Wayne

在2022年10月19日星期三 UTC+8 19:40:01 写道:

Dear Wayne,

I am a bit surprised by your numbers and find them rather high, at
least with the chosen problem sizes. I would expect the
matrix-free solver to run in less than a second for 111,000
unknowns on typical computers, not almost 10 seconds. I need to
honestly say that I do not have a good explanation at this point.
I did not write this tutorial program, but I know more or less
what should happen. Let me ask a basic question first: Did you
record the timings with release mode? The numbers would make more
sense if they are based on the debug mode.

Best,
Martin


On 19.10.22 12:08, 'yy.wayne' via deal.II User Group wrote:

Thanks for your reply Peter,

The matrix-free run is basic same as in step-75 except I
substitute coarse grid solver. For fe_degree=6 without GMG and
fe_degree in each level decrease by 1 for pMG, the solve_system()
function runtime is 24.1s. It's decomposed to *MatrixFree MG
operators construction*(1.36s), MatrixFree MG transfers(2.73s), 
KLU coarse grid solver(5.7s), *setting smoother_data and
compute_inverse_diagonal for level matrices*(3.4s) CG
iteration(9.8s).

The two bold texts cost a lot more(133s and 62s, respectively) in
matrix-based multigrid case. I noticed just as in step-16, the
finest level matrix is assembled twice(one for system_matrix and
one for mg_matrices[maxlevel]) so assembling time cost more.

Best,
Wayne

在2022年10月19日星期三 UTC+8 17:10:27 写道:

Hi Wayne,

your numbers make totally sense. Don't forget that you are
running for high order: degree=6! The number of non-zeroes
per element-stiffness matrix is ((degree + 1)^dim)^2 and the
cost of computing the element stiffness matrix is even
((degree + 1)^dim)^3 if I am not mistaken (3 nested loop: i,
j and q). Higher orders are definitely made for matrix-free
algorithms!

Out of curiosity: how large is the setup cost of MG in the
case of the matrix-free run? As a comment: don't be surprised
that the setup costs are relatively high compared to the
solution process: you are probably setting up a new
Triangulation-, DoFHander-, MatrixFree-, ... -object per
level. In many simulations, you can reuse these objects,
since you don't perform AMR every time step.

Peter

On Wednesday, 19 October 2022 at 10:38:34 UTC+2 yy.wayne wrote:

Hello everyone,

I modified step-75 a little bit and try to test it's
runtime. However the result is kind of inexplainable from
my point of view, especially on *disproportionate
assemble time and solve time*. Here are some changes:
1. a matrix-based version of step75 is contructed to
compare with matrix-free one.
2. no mesh refinement and no GMG, and fe_degree is
constant across all cells within every cycle. Fe_degree
adds one after each cycle. I make this setting to compare
runtime due to fe_degree.
3. a direct solver on coareset grid. I think it won't
affect runtime since coarest grid never change

For final cycle it has fe_degree=6 and DoFs=111,361.
For matrix-based method, overall runtime is 301s where
setup system(84s) and solve system(214s) take up most. In
step-75 solve system actually did both multigrid matrices
assembling, smoother construction, and CG solving.
Runtime of this case is shown:
matrix-based.png
On each level I print time assembling level matrix. *The
solve system is mostly decomposed to MG matrices
assembling(83.9+33.6+...=133s), smoother set up(65s),
coarse grid solve(6s) and CG solve(2.56).* My doubt is
why actual CG solve only takes 2.56 out of 301 seconds
for this problem? The time spent on assembling and
smoother construction account too much that they seems a
burden.


[deal.II] measuring cpu and wall time for assembly routine

2022-10-19 Thread Simon
Dear all,

I implemented two different versions to compute a stress for a given strain 
and want to compare the associated computation times in release mode.

version 1: stress = fun1(strain)  cpu time:  4.52  s  wall time:   
4.53 s
version 2: stress = fun2(strain) cpu time: 32.5s  wall time: 
167.5 s 

fun1 and fun2, respectively, are invoked for all quadrature points 
(1,286,144 in the above example) defined on the triangulation. My program 
is not parallelized.
In fun2, I call  find_active_cell_around_point 

 
twice for two different points on two different (helper) triangulations and 
initialize two FEValues objects 
with the points ' ref_point_vol' and 'ref_point_dev' 
as returned by find_active_cell_around_point 

 
.
FEValues<1> fe_vol(dof_handler_vol.get_fe(), 
Quadrature<1>(ref_point_vol),
update_gradients | update_values); 
   
FEValues<1> fe_values_energy_dev(this->dof_handler_dev.get_fe(), 
Quadrature<1>(ref_point_dev),
update_gradients | update_values); 
  

I figured out that the initialization of the two FEValues objects is the 
biggest portion of the above mentioned times.  In particular, if I comment 
the initialization out, I have 
cpu time: 6.54 s wall time: 6.55 s .

The triangulations associated with dof_handler_vol and dof_handler_dev are 
both 1d and store only 4 and 16 elements, respectively. That said, I am 
wondering why the initialization takes so long (roughly 100 seconds wall 
time in total) and why this causes a gap between the cpu and wall time.
Unfortunately, I have to reinitialize them anew whenever fun2 is called, 
because  the point 'ref_point_vol' (see Quadrature<1>(ref_point_vol)) is 
different in each call to fun2. 

Best
Simon



-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/8f0e7843-c382-44bc-8e24-bbb65818c40cn%40googlegroups.com.


Re: [deal.II] Re: Run time analysis for step-75 with matrix-free or matrix-based method

2022-10-19 Thread 'yy.wayne' via deal.II User Group
Thanks Martin !

I never considered about Debug or optimized mode before. Cmake result says 
I'm using Debug mode.

Some more information: The computaiton is done in deal.ii 9.4.0 oracle 
virtualBox, with 1 mpi process in qtcreator, and CPU is intel 10600kf. I 
didn't change the CMakeLists and just copy from examples, so I think by 
default it's debug mode.

Best,
Wayne

在2022年10月19日星期三 UTC+8 19:40:01 写道:

> Dear Wayne,
>
> I am a bit surprised by your numbers and find them rather high, at least 
> with the chosen problem sizes. I would expect the matrix-free solver to run 
> in less than a second for 111,000 unknowns on typical computers, not almost 
> 10 seconds. I need to honestly say that I do not have a good explanation at 
> this point. I did not write this tutorial program, but I know more or less 
> what should happen. Let me ask a basic question first: Did you record the 
> timings with release mode? The numbers would make more sense if they are 
> based on the debug mode.
>
> Best,
> Martin
>
>
> On 19.10.22 12:08, 'yy.wayne' via deal.II User Group wrote:
>
> Thanks for your reply Peter, 
>
> The matrix-free run is basic same as in step-75 except I substitute coarse 
> grid solver. For fe_degree=6 without GMG and fe_degree in each level 
> decrease by 1 for pMG, the solve_system() function runtime is 24.1s. It's 
> decomposed to *MatrixFree MG operators construction*(1.36s), MatrixFree 
> MG transfers(2.73s),  KLU coarse grid solver(5.7s), *setting 
> smoother_data and compute_inverse_diagonal for level matrices*(3.4s) CG 
> iteration(9.8s).
>
> The two bold texts cost a lot more(133s and 62s, respectively) in 
> matrix-based multigrid case. I noticed just as in step-16, the finest level 
> matrix is assembled twice(one for system_matrix and one for 
> mg_matrices[maxlevel]) so assembling time cost more.
>
> Best,
> Wayne
>
> 在2022年10月19日星期三 UTC+8 17:10:27 写道:
>
>> Hi Wayne, 
>>
>> your numbers make totally sense. Don't forget that you are running for 
>> high order: degree=6! The number of non-zeroes per element-stiffness matrix 
>> is ((degree + 1)^dim)^2 and the cost of computing the element stiffness 
>> matrix is even ((degree + 1)^dim)^3 if I am not mistaken (3 nested loop: i, 
>> j and q). Higher orders are definitely made for matrix-free algorithms!
>>
>> Out of curiosity: how large is the setup cost of MG in the case of the 
>> matrix-free run? As a comment: don't be surprised that the setup costs are 
>> relatively high compared to the solution process: you are probably setting 
>> up a new Triangulation-, DoFHander-, MatrixFree-, ... -object per level. In 
>> many simulations, you can reuse these objects, since you don't perform AMR 
>> every time step. 
>>
>> Peter
>>
>> On Wednesday, 19 October 2022 at 10:38:34 UTC+2 yy.wayne wrote:
>>
>>> Hello everyone, 
>>>
>>> I modified step-75 a little bit and try to test it's runtime. However 
>>> the result is kind of inexplainable from my point of view, especially on 
>>> *disproportionate 
>>> assemble time and solve time*. Here are some changes:
>>> 1. a matrix-based version of step75 is contructed to compare with 
>>> matrix-free one.
>>> 2. no mesh refinement and no GMG, and fe_degree is constant across all 
>>> cells within every cycle. Fe_degree adds one after each cycle. I make this 
>>> setting to compare runtime due to fe_degree.
>>> 3. a direct solver on coareset grid. I think it won't affect runtime 
>>> since coarest grid never change
>>>
>>> For final cycle it has fe_degree=6 and DoFs=111,361.
>>> For matrix-based method, overall runtime is 301s where setup system(84s) 
>>> and solve system(214s) take up most. In step-75 solve system actually did 
>>> both multigrid matrices assembling, smoother construction, and CG solving. 
>>> Runtime of this case is shown:
>>> [image: matrix-based.png]
>>> On each level I print time assembling level matrix. *The solve system 
>>> is mostly decomposed to MG matrices assembling(83.9+33.6+...=133s), 
>>> smoother set up(65s), coarse grid solve(6s) and CG solve(2.56).* My 
>>> doubt is why actual CG solve only takes 2.56 out of 301 seconds for this 
>>> problem? The time spent on assembling and smoother construction account too 
>>> much that they seems a burden.
>>>
>>> For matrix-free method however, runtime is much smaller without 
>>> assembling matrices. Besides, CG solve cost more because of more 
>>> computation required by matrix-free I guess. But *smoother construction 
>>> time reduces significantly* as well is out of my expectation.
>>> [image: matrix-free.png]
>>>
>>> Matrix-free framework saves assembling time but it seems too efficient 
>>> to be real. The text in bold are my main confusion. May someone share some 
>>> experience on matrix-free and multigrid methods' time consumption?
>>>
>>> Best,
>>> Wayne
>>>
>>> -- 
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see 
> https://groups.google.com/d/forum/dealii?hl=en
> --- 
> You 

Re: [deal.II] Re: Run time analysis for step-75 with matrix-free or matrix-based method

2022-10-19 Thread Martin Kronbichler

Dear Wayne,

I am a bit surprised by your numbers and find them rather high, at least 
with the chosen problem sizes. I would expect the matrix-free solver to 
run in less than a second for 111,000 unknowns on typical computers, not 
almost 10 seconds. I need to honestly say that I do not have a good 
explanation at this point. I did not write this tutorial program, but I 
know more or less what should happen. Let me ask a basic question first: 
Did you record the timings with release mode? The numbers would make 
more sense if they are based on the debug mode.


Best,
Martin


On 19.10.22 12:08, 'yy.wayne' via deal.II User Group wrote:

Thanks for your reply Peter,

The matrix-free run is basic same as in step-75 except I substitute 
coarse grid solver. For fe_degree=6 without GMG and fe_degree in each 
level decrease by 1 for pMG, the solve_system() function runtime is 
24.1s. It's decomposed to *MatrixFree MG operators 
construction*(1.36s), MatrixFree MG transfers(2.73s),  KLU coarse grid 
solver(5.7s), *setting smoother_data and compute_inverse_diagonal for 
level matrices*(3.4s) CG iteration(9.8s).


The two bold texts cost a lot more(133s and 62s, respectively) in 
matrix-based multigrid case. I noticed just as in step-16, the finest 
level matrix is assembled twice(one for system_matrix and one for 
mg_matrices[maxlevel]) so assembling time cost more.


Best,
Wayne

在2022年10月19日星期三 UTC+8 17:10:27 写道:

Hi Wayne,

your numbers make totally sense. Don't forget that you are running
for high order: degree=6! The number of non-zeroes per
element-stiffness matrix is ((degree + 1)^dim)^2 and the cost of
computing the element stiffness matrix is even ((degree +
1)^dim)^3 if I am not mistaken (3 nested loop: i, j and q). Higher
orders are definitely made for matrix-free algorithms!

Out of curiosity: how large is the setup cost of MG in the case of
the matrix-free run? As a comment: don't be surprised that the
setup costs are relatively high compared to the solution process:
you are probably setting up a new Triangulation-, DoFHander-,
MatrixFree-, ... -object per level. In many simulations, you can
reuse these objects, since you don't perform AMR every time step.

Peter

On Wednesday, 19 October 2022 at 10:38:34 UTC+2 yy.wayne wrote:

Hello everyone,

I modified step-75 a little bit and try to test it's runtime.
However the result is kind of inexplainable from my point of
view, especially on *disproportionate assemble time and solve
time*. Here are some changes:
1. a matrix-based version of step75 is contructed to compare
with matrix-free one.
2. no mesh refinement and no GMG, and fe_degree is constant
across all cells within every cycle. Fe_degree adds one after
each cycle. I make this setting to compare runtime due to
fe_degree.
3. a direct solver on coareset grid. I think it won't affect
runtime since coarest grid never change

For final cycle it has fe_degree=6 and DoFs=111,361.
For matrix-based method, overall runtime is 301s where setup
system(84s) and solve system(214s) take up most. In step-75
solve system actually did both multigrid matrices assembling,
smoother construction, and CG solving. Runtime of this case is
shown:
matrix-based.png
On each level I print time assembling level matrix. *The solve
system is mostly decomposed to MG matrices
assembling(83.9+33.6+...=133s), smoother set up(65s), coarse
grid solve(6s) and CG solve(2.56).* My doubt is why actual CG
solve only takes 2.56 out of 301 seconds for this problem? The
time spent on assembling and smoother construction account too
much that they seems a burden.

For matrix-free method however, runtime is much smaller
without assembling matrices. Besides, CG solve cost more
because of more computation required by matrix-free I guess.
But *smoother construction time reduces significantly* as well
is out of my expectation.
matrix-free.png

Matrix-free framework saves assembling time but it seems too
efficient to be real. The text in bold are my main confusion.
May someone share some experience on matrix-free and multigrid
methods' time consumption?

Best,
Wayne

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en

---
You received this message because you are subscribed to the Google 
Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/58dae75e-644e-49d8-bee2-e212c5184e1cn%40googlegroups.com 

[deal.II] Re: Run time analysis for step-75 with matrix-free or matrix-based method

2022-10-19 Thread 'yy.wayne' via deal.II User Group
Thanks for your reply Peter,

The matrix-free run is basic same as in step-75 except I substitute coarse 
grid solver. For fe_degree=6 without GMG and fe_degree in each level 
decrease by 1 for pMG, the solve_system() function runtime is 24.1s. It's 
decomposed to *MatrixFree MG operators construction*(1.36s), MatrixFree MG 
transfers(2.73s),  KLU coarse grid solver(5.7s), *setting smoother_data and 
compute_inverse_diagonal for level matrices*(3.4s) CG iteration(9.8s).

The two bold texts cost a lot more(133s and 62s, respectively) in 
matrix-based multigrid case. I noticed just as in step-16, the finest level 
matrix is assembled twice(one for system_matrix and one for 
mg_matrices[maxlevel]) so assembling time cost more.

Best,
Wayne

在2022年10月19日星期三 UTC+8 17:10:27 写道:

> Hi Wayne,
>
> your numbers make totally sense. Don't forget that you are running for 
> high order: degree=6! The number of non-zeroes per element-stiffness matrix 
> is ((degree + 1)^dim)^2 and the cost of computing the element stiffness 
> matrix is even ((degree + 1)^dim)^3 if I am not mistaken (3 nested loop: i, 
> j and q). Higher orders are definitely made for matrix-free algorithms!
>
> Out of curiosity: how large is the setup cost of MG in the case of the 
> matrix-free run? As a comment: don't be surprised that the setup costs are 
> relatively high compared to the solution process: you are probably setting 
> up a new Triangulation-, DoFHander-, MatrixFree-, ... -object per level. In 
> many simulations, you can reuse these objects, since you don't perform AMR 
> every time step. 
>
> Peter
>
> On Wednesday, 19 October 2022 at 10:38:34 UTC+2 yy.wayne wrote:
>
>> Hello everyone,
>>
>> I modified step-75 a little bit and try to test it's runtime. However the 
>> result is kind of inexplainable from my point of view, especially on 
>> *disproportionate 
>> assemble time and solve time*. Here are some changes:
>> 1. a matrix-based version of step75 is contructed to compare with 
>> matrix-free one.
>> 2. no mesh refinement and no GMG, and fe_degree is constant across all 
>> cells within every cycle. Fe_degree adds one after each cycle. I make this 
>> setting to compare runtime due to fe_degree.
>> 3. a direct solver on coareset grid. I think it won't affect runtime 
>> since coarest grid never change
>>
>> For final cycle it has fe_degree=6 and DoFs=111,361.
>> For matrix-based method, overall runtime is 301s where setup system(84s) 
>> and solve system(214s) take up most. In step-75 solve system actually did 
>> both multigrid matrices assembling, smoother construction, and CG solving. 
>> Runtime of this case is shown:
>> [image: matrix-based.png]
>> On each level I print time assembling level matrix. *The solve system is 
>> mostly decomposed to MG matrices assembling(83.9+33.6+...=133s), smoother 
>> set up(65s), coarse grid solve(6s) and CG solve(2.56).* My doubt is why 
>> actual CG solve only takes 2.56 out of 301 seconds for this problem? The 
>> time spent on assembling and smoother construction account too much that 
>> they seems a burden.
>>
>> For matrix-free method however, runtime is much smaller without 
>> assembling matrices. Besides, CG solve cost more because of more 
>> computation required by matrix-free I guess. But *smoother construction 
>> time reduces significantly* as well is out of my expectation.
>> [image: matrix-free.png]
>>
>> Matrix-free framework saves assembling time but it seems too efficient to 
>> be real. The text in bold are my main confusion. May someone share some 
>> experience on matrix-free and multigrid methods' time consumption?
>>
>> Best,
>> Wayne
>>
>>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/58dae75e-644e-49d8-bee2-e212c5184e1cn%40googlegroups.com.


[deal.II] Re: Run time analysis for step-75 with matrix-free or matrix-based method

2022-10-19 Thread Peter Munch
Hi Wayne,

your numbers make totally sense. Don't forget that you are running for high 
order: degree=6! The number of non-zeroes per element-stiffness matrix is 
((degree + 1)^dim)^2 and the cost of computing the element stiffness matrix 
is even ((degree + 1)^dim)^3 if I am not mistaken (3 nested loop: i, j and 
q). Higher orders are definitely made for matrix-free algorithms!

Out of curiosity: how large is the setup cost of MG in the case of the 
matrix-free run? As a comment: don't be surprised that the setup costs are 
relatively high compared to the solution process: you are probably setting 
up a new Triangulation-, DoFHander-, MatrixFree-, ... -object per level. In 
many simulations, you can reuse these objects, since you don't perform AMR 
every time step. 

Peter

On Wednesday, 19 October 2022 at 10:38:34 UTC+2 yy.wayne wrote:

> Hello everyone,
>
> I modified step-75 a little bit and try to test it's runtime. However the 
> result is kind of inexplainable from my point of view, especially on 
> *disproportionate 
> assemble time and solve time*. Here are some changes:
> 1. a matrix-based version of step75 is contructed to compare with 
> matrix-free one.
> 2. no mesh refinement and no GMG, and fe_degree is constant across all 
> cells within every cycle. Fe_degree adds one after each cycle. I make this 
> setting to compare runtime due to fe_degree.
> 3. a direct solver on coareset grid. I think it won't affect runtime since 
> coarest grid never change
>
> For final cycle it has fe_degree=6 and DoFs=111,361.
> For matrix-based method, overall runtime is 301s where setup system(84s) 
> and solve system(214s) take up most. In step-75 solve system actually did 
> both multigrid matrices assembling, smoother construction, and CG solving. 
> Runtime of this case is shown:
> [image: matrix-based.png]
> On each level I print time assembling level matrix. *The solve system is 
> mostly decomposed to MG matrices assembling(83.9+33.6+...=133s), smoother 
> set up(65s), coarse grid solve(6s) and CG solve(2.56).* My doubt is why 
> actual CG solve only takes 2.56 out of 301 seconds for this problem? The 
> time spent on assembling and smoother construction account too much that 
> they seems a burden.
>
> For matrix-free method however, runtime is much smaller without assembling 
> matrices. Besides, CG solve cost more because of more computation required 
> by matrix-free I guess. But *smoother construction time reduces 
> significantly* as well is out of my expectation.
> [image: matrix-free.png]
>
> Matrix-free framework saves assembling time but it seems too efficient to 
> be real. The text in bold are my main confusion. May someone share some 
> experience on matrix-free and multigrid methods' time consumption?
>
> Best,
> Wayne
>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/36311361-2ecb-4b81-b196-dd89d2263534n%40googlegroups.com.