On 8 Jul 2015, at 15:14, Erik Schnetter <[email protected]> wrote:
> I added a second benchmark, using a Thornburg04 patch system, 8th order > finite differencing, and 4th order patch interpolation. The results are > > original: 8.53935e-06 sec > rewrite: 8.55188e-06 sec > > this time with 1 thread per MPI process, since that was most efficient in > both cases. Most of the time is spent in inter-patch interpolation, which is > much more expensive than in a "regular" case since this benchmark is run on a > single node and hence with very small grids. > > With these numbers under our belt, can we merge the rewrite branch? The "jacobian" benchmark that I gave you was still a pure kernel benchmark, involving no interpatch interpolation. It just measured the speed of the RHSs when Jacobians were included. I would also not use a single-threaded benchmark with very small grid sizes; this might have been fastest in this artificial case, but in practice I don't think we would use that configuration. The benchmark you have now run seems to be more of a "complete system" benchmark, which is useful, but different. I think it is important that the kernel itself has not gotten slower, even if the kernel is not currently a major contributor to runtime. We specifically split out the advection derivatives because they made the code with 8th order and Jacobians a fair bit slower. I would just like to see that this is not still the case with the new version, which has changed the way this is handled. > > -erik > > > On Sat, Jul 4, 2015 at 5:19 PM, Ian Hinder <[email protected]> wrote: > hi Erik, > > You could try the ones at > > https://bitbucket.org/ianhinder/cactusbench/src/faea4e13ed4232968e81edd1bbc80519198fe2b2/examples/ML_BSSN_Test/benchmark/?at=master > > I haven't updated them in a while, but hopefully the ET is sufficiently > backward compatible for them to still work. > > -- > Ian Hinder > http://members.aei.mpg.de/ianhin > > On 4 Jul 2015, at 17:04, Erik Schnetter <[email protected]> wrote: > >> On Sat, Jul 4, 2015 at 10:21 AM, Ian Hinder <[email protected]> wrote: >> >> On 3 Jul 2015, at 22:38, Erik Schnetter <[email protected]> wrote: >> >>> I ran the Simfactory benchmark for ML_BSSN on both the current version and >>> the "rewrite" branch to see whether this branch is ready for production >>> use. I ran this benchmark on a single node of Shelob at LSU. In both cases, >>> using 2 OpenMP threads and 8 MPI processes per node was fastest, so I am >>> reporting these results below. Since I was interested in the performance of >>> McLachlan, this is a unigrid vacuum benchmark using fourth order >>> differencing. >>> >>> One noteworthy difference is that dissipation as implemented in the >>> "rewrite" branch is finally approximately as fast as thorn Dissipation, and >>> I have thus used this option for the "rewrite" branch. >>> >>> Here are the high-level results: >>> >>> current: 3.03136e-06 sec per grid point >>> rewrite: 2.85734e-06 sec per grid point >>> >>> That is, the rewrite branch is about 5% faster. >> >> Hi Erik, >> >> That is very reassuring! However, for production use, I would be more >> interested in 6th or 8th order finite differencing (where the advection >> stencils become very large), and with Jacobians. If 8th order with >> Jacobians is at least a similar speed with the rewrite branch, then I would >> be happy with switching. >> >> Ian >> >> Do you want to suggest a particular benchmark parameter file? >> >> -erik >> >> -- >> Erik Schnetter <[email protected]> >> http://www.perimeterinstitute.ca/personal/eschnetter/ > > > > -- > Erik Schnetter <[email protected]> > http://www.perimeterinstitute.ca/personal/eschnetter/ -- Ian Hinder http://members.aei.mpg.de/ianhin
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
