On Fri, Jul 24, 2015 at 1:58 PM, Ian Hinder <[email protected]> wrote:
> > On 24 Jul 2015, at 19:42, Erik Schnetter <[email protected]> wrote: > > On Fri, Jul 24, 2015 at 1:39 PM, Ian Hinder <[email protected]> wrote: > >> >> On 24 Jul 2015, at 19:15, Erik Schnetter <[email protected]> wrote: >> >> On Fri, Jul 24, 2015 at 11:57 AM, Ian Hinder <[email protected]> >> wrote: >> >>> >>> On 8 Jul 2015, at 16:53, Ian Hinder <[email protected]> wrote: >>> >>> >>> On 8 Jul 2015, at 15:14, Erik Schnetter <[email protected]> wrote: >>> >>> I added a second benchmark, using a Thornburg04 patch system, 8th order >>> finite differencing, and 4th order patch interpolation. The results are >>> >>> original: 8.53935e-06 sec >>> rewrite: 8.55188e-06 sec >>> >>> this time with 1 thread per MPI process, since that was most efficient >>> in both cases. Most of the time is spent in inter-patch interpolation, >>> which is much more expensive than in a "regular" case since this benchmark >>> is run on a single node and hence with very small grids. >>> >>> With these numbers under our belt, can we merge the rewrite branch? >>> >>> >>> The "jacobian" benchmark that I gave you was still a pure kernel >>> benchmark, involving no interpatch interpolation. It just measured the >>> speed of the RHSs when Jacobians were included. I would also not use a >>> single-threaded benchmark with very small grid sizes; this might have been >>> fastest in this artificial case, but in practice I don't think we would use >>> that configuration. The benchmark you have now run seems to be more of a >>> "complete system" benchmark, which is useful, but different. >>> >>> I think it is important that the kernel itself has not gotten slower, >>> even if the kernel is not currently a major contributor to runtime. We >>> specifically split out the advection derivatives because they made the code >>> with 8th order and Jacobians a fair bit slower. I would just like to see >>> that this is not still the case with the new version, which has changed the >>> way this is handled. >>> >>> >>> I have now run my benchmarks on both the original and the rewritten >>> McLachlan. I seem to find that the ML_BSSN_* functions in >>> Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns, excluding the constraint >>> calculations, are between 11% and 15% slower with the rewrite branch, >>> depending on the details of the evolution. See attached plot. This is on >>> Datura with quite old CPUs (Intel Xeon CPU X5650 2.67GHz). >>> >> >> What exactly do you measure -- which bins or routines? Does this involve >> communication? Are you using thorn Dissipation? >> >> >> I take all the timers in Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns >> that start with ML_BSSN_ and eliminate the ones containing "constraints" >> (case insensitive). This is running on two processes, one node, 6 threads >> per node. Threads are correctly bound to cores. There is ghostzone >> exchange between the processes, so yes, there is communication in the >> ML_BSSN_SelectBCs SYNC calls, but it is node-local. >> > > Can you include thorn Dissipation in the "before" case, and use > McLachlan's dissipation in the "after" case? > > > There is no dissipation in either case. > > The output data is in > > > http://git.barrywardell.net/?p=McLachlanBenchmarks.git;h=refs/runs/orig/20150724-174334 > > http://git.barrywardell.net/?p=McLachlanBenchmarks.git;h=refs/runs/rewrite/20150724-170542 > > including the parameter files. > > Actually, what I said before was wrong; the timers I am using are under > "thorns", not "syncs", so even the node-local communication should not be > counted. > McLachlan has not been optimized for runs without dissipation. If you this this is important, then we can introduce a special case. I expect this to improve performance. However, running BSSN without dissipation is not what one would do in production, so I didn't investigate this case. -erik -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
