On Fri, Jul 24, 2015 at 11:57 AM, Ian Hinder <[email protected]> wrote:
> > On 8 Jul 2015, at 16:53, Ian Hinder <[email protected]> wrote: > > > On 8 Jul 2015, at 15:14, Erik Schnetter <[email protected]> wrote: > > I added a second benchmark, using a Thornburg04 patch system, 8th order > finite differencing, and 4th order patch interpolation. The results are > > original: 8.53935e-06 sec > rewrite: 8.55188e-06 sec > > this time with 1 thread per MPI process, since that was most efficient in > both cases. Most of the time is spent in inter-patch interpolation, which > is much more expensive than in a "regular" case since this benchmark is run > on a single node and hence with very small grids. > > With these numbers under our belt, can we merge the rewrite branch? > > > The "jacobian" benchmark that I gave you was still a pure kernel > benchmark, involving no interpatch interpolation. It just measured the > speed of the RHSs when Jacobians were included. I would also not use a > single-threaded benchmark with very small grid sizes; this might have been > fastest in this artificial case, but in practice I don't think we would use > that configuration. The benchmark you have now run seems to be more of a > "complete system" benchmark, which is useful, but different. > > I think it is important that the kernel itself has not gotten slower, even > if the kernel is not currently a major contributor to runtime. We > specifically split out the advection derivatives because they made the code > with 8th order and Jacobians a fair bit slower. I would just like to see > that this is not still the case with the new version, which has changed the > way this is handled. > > > I have now run my benchmarks on both the original and the rewritten > McLachlan. I seem to find that the ML_BSSN_* functions in > Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns, excluding the constraint > calculations, are between 11% and 15% slower with the rewrite branch, > depending on the details of the evolution. See attached plot. This is on > Datura with quite old CPUs (Intel Xeon CPU X5650 2.67GHz). > What exactly do you measure -- which bins or routines? Does this involve communication? Are you using thorn Dissipation? -erik -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
