Re: [Libmesh-devel] Shape function vectorization

2016-01-11 Thread Derek Gaston
Tim, Thanks for the work on this... and for reporting back with some results. When you want to work on this again I can definitely help you nail down a good test problem. Derek On Mon, Jan 11, 2016 at 10:02 AM Tim Adowski wrote: > Hi all: > > I first want to make a clarification about the 10%

Re: [Libmesh-devel] Shape function vectorization

2016-01-11 Thread Tim Adowski
Hi all: I first want to make a clarification about the 10% speedup I originally reported. That speedup was for the quadrature point loop only, not the total runtime. For my original test case, the overall runtime was approx 6.5sec and the QP loop was 0.1sec (not 1sec as was originally given). That

Re: [Libmesh-devel] Shape function vectorization

2016-01-05 Thread Jed Brown
Derek Gaston writes: > 4. Loading up dense matrices of shape function evaluations so that all > variable's values and gradients (of the same FEType) can be computed > simultaneously with one single mat*vec. This creates perfectly vectorizable > operations. I tried using Eigen for this as well as t

Re: [Libmesh-devel] Shape function vectorization

2016-01-05 Thread Tim Adowski
On Tue, Jan 5, 2016 at 8:43 AM, Paul T. Bauman wrote: > > > On Tue, Jan 5, 2016 at 12:30 AM, Roy Stogner > wrote: > >> >> On Mon, 4 Jan 2016, Tim Adowski wrote: >> >> > However, all versions of GCC were unable to vectorize the Ke loop >> > due to "bad data ref", and both Intel versions required

Re: [Libmesh-devel] Shape function vectorization

2016-01-05 Thread Paul T. Bauman
On Tue, Jan 5, 2016 at 9:33 AM, John Peterson wrote: > > > I wonder if you would have more luck doing your vectorization within > FEMSystem rather than on "old style" codes like ex3? In FEMSystem, the > framework has more control over the loops, iteration counters, and variable > storage, so it m

Re: [Libmesh-devel] Shape function vectorization

2016-01-05 Thread John Peterson
On Mon, Jan 4, 2016 at 8:58 PM, Tim Adowski wrote: > Hello libMesh-devel: > > Using the START_LOG/STOP_LOG macros to time the QP loop, I got the > following results for the 1D shape function implementation: > Original: 1028ms (average) > Modified with Fe loop vectorized: 920ms (average) > Differe

Re: [Libmesh-devel] Shape function vectorization

2016-01-05 Thread Paul T. Bauman
On Tue, Jan 5, 2016 at 2:31 AM, Derek Gaston wrote: > Maybe I never discussed these results anywhere... because I can't find any > discussion of them :-) > > Here are the timing results: > https://docs.google.com/spreadsheets/d/1dS0MjyTYRQ_o8pBsuGncJtSYxEYqKw42Chr8ougD1ek/edit?usp=docslist_api

Re: [Libmesh-devel] Shape function vectorization

2016-01-05 Thread Paul T. Bauman
On Tue, Jan 5, 2016 at 1:50 AM, Derek Gaston wrote: > > > I did a bunch of work on this myself a few years ago... all I was > attempting to speed up was just variable value evaluation... not Re/Ke > evaluation as a whole. Let me see if I can dig up what I did..(I'll do some > searching and send an

Re: [Libmesh-devel] Shape function vectorization

2016-01-05 Thread Paul T. Bauman
On Tue, Jan 5, 2016 at 12:30 AM, Roy Stogner wrote: > > On Mon, 4 Jan 2016, Tim Adowski wrote: > > > However, all versions of GCC were unable to vectorize the Ke loop > > due to "bad data ref", and both Intel versions required "#pragma > > ivdep" in order to vectorize the Ke loop. > > One last th

Re: [Libmesh-devel] Shape function vectorization

2016-01-04 Thread Derek Gaston
Maybe I never discussed these results anywhere... because I can't find any discussion of them :-) Here are the timing results: https://docs.google.com/spreadsheets/d/1dS0MjyTYRQ_o8pBsuGncJtSYxEYqKw42Chr8ougD1ek/edit?usp=docslist_api I'll attempt to explain a little... All I was timing was variab

Re: [Libmesh-devel] Shape function vectorization

2016-01-04 Thread Derek Gaston
Yes, try to do the vector indexing yourself first to see if it's the operator calls that are throwing things off. I did a bunch of work on this myself a few years ago... all I was attempting to speed up was just variable value evaluation... not Re/Ke evaluation as a whole. Let me see if I can dig

Re: [Libmesh-devel] Shape function vectorization

2016-01-04 Thread Roy Stogner
On Mon, 4 Jan 2016, Tim Adowski wrote: > However, all versions of GCC were unable to vectorize the Ke loop > due to "bad data ref", and both Intel versions required "#pragma > ivdep" in order to vectorize the Ke loop. One last thought: is it possible that what is confusing gcc isn't your class,

Re: [Libmesh-devel] Shape function vectorization

2016-01-04 Thread Roy Stogner
On Mon, 4 Jan 2016, Tim Adowski wrote: > Using the START_LOG/STOP_LOG macros to time the QP loop, I got the following > results for the 1D shape function implementation: > Original: 1028ms (average) > Modified with Fe loop vectorized: 920ms (average) > Difference: ~10% > So there is a significan

[Libmesh-devel] Shape function vectorization

2016-01-04 Thread Tim Adowski
Hello libMesh-devel: I have been working to replace the current std::vector > implementation of the shape functions, with the goal of achieving vectorization when the shape function values are used. ​However, I have only had partial success, and I am all out of ideas. I would appreciate any feedba