Hi Ian, You are right, I have 30 cores not 30 CPUs. Indeed they are 30 virtual cores because I am running the simulation in a 10 VM Cluster (with one vCPU with 3 cores each VM) built with RHEL7 + OpenMPI. Many thanks for your suggestion, I am going to try it and see if I can reach 10 M//hr (actual value is 7M/hr).
Cheers, Benja On Wed, Oct 24, 2018 at 8:02 PM <[email protected]> wrote: > > > On 23 Oct 2018, at 08:57, Benjamin Chardi Marco <[email protected]> > wrote: > > Dear friends, > > We are trying to use the EinsteinToolKit GW150914.rpar binary balckhole > merge simulation as use case to test that our container orchestration > product OpenShift can be used for HPC. > Our test environment only has 30 CPUs so we need to execute that > simulation in a reasonable time. > > > Hi, > > 30 CPUs is quite a lot; do you really mean 30 CPUs, or 30 cores? What CPU > are you using, and how many cores does it have? Also, what is the > interconnect between the nodes? Infiniband, omnipath, gigabit ethernet, > etc? > > Please can you tell us how to modify GW150914.rpar in order to get a less > precise simulation executed in a 30 CPUs cluster in a reasonable time (~ > few days). > > > You can run a lower resolution by changing the > > --define N 28 > > to something else. This must be a multiple of 4, and can probably go as > low as 20 without the simulation crashing. [Roland: you mentioned 24 in > your email. Did you try 20 and it crashed? I seem to remember 20 working > at one point.] This is a measure of the number of grid cells across the > black holes, so increasing it gives you more cells, higher resolution, and > the run goes more slowly. > > Roland's suggestions are also good, but I would make a couple of changes > to what he recommended. > > The original boundary condition in GW150914.rpar sets the time derivative > ("rhs") of all fields to 0 on the outer spherical boundary. This fixes the > fields to their initial value, so can be considered a Dirichlet (or > "scalar") boundary condition: ML_BSSN::rhs_boundary_condition = > "scalar". This is in general a bad thing to do (you will get almost > perfect reflections of all outgoing waves), but the boundary was placed far > enough away that it could not influence the waveform. This is generally > very cheap with the spherical outer grid used here, and was done because we > had not implemented radiative boundary conditions that worked with the > spherical grids in McLachlan at the time. > > The improvement that I believe Roland meant to make was to change the > boundary condition to radiative (not Robin), which has now been implemented > in the code. This makes the fields obey an advection-type equation on the > outer boundary, assuming that all fields are solutions to outgoing radial > wave equations. In Roland's parameter file, he set > > NewRad::z_is_radial = yes > > but this is a technical change to trick NewRad into working with the > spherical grids that we use here. To change the boundary condition itself, > you need to set > > ML_BSSN::rhs_boundary_condition = "NewRad" > > rather than "scalar". > > The other change Roland made was to change final_time to half its current > value: > > final_time = waveform_length + outermost_detector > -> > final_time = 0.5*(waveform_length + outermost_detector) > > This doesn't seem correct. It is true that final_time is used to set > sphere_outer_radius, but this will not halve the size of the domain. > Further, it will halve the runtime of the simulation, so the simulation > will stop before the BHs have merged. Instead, I would change > sphere_outer_radius as follows: > > -sphere_outer_radius = int((outermost_detector + final_time)/(i*hr))*i*hr > +sphere_outer_radius = int((1000)/(i*hr))*i*hr > > This might make the waveform noiser, but with the changed boundary > condition, it shouldn't be too bad. > > Now we can run the simulation GW150914.rpar using OpenMPI + > EinsteinToolKit, but it takes so long to be executed (~ weeks). > > > This sounds liked quite a long time! It sounds too long. On the page > describing the simulation, > https://einsteintoolkit.org/gallery/bbh/index.html, it says that the > simulation takes 2.8 days on 128 cores of an Intel(R) Xeon(R) CPU E5-2630 > v3 @ 2.40GHz (Haswell). Assuming that you mean you are using 30 cores, and > if you are using a similar CPU, then it should take 2.8 * 128/30 = 11.9 > days. Is this about what you see? What speed is reported? You can see > this in the output file GW150914_*.out: > > > ---------------------------------------------------------------------------------------------------------- > Iteration Time | *me_per_hour | ML_BSSN::phi | > *TISTICS::maxrss_mb | *TICS::swap_used_mb > | | minimum maximum | minimum > maximum | minimum maximum > > ---------------------------------------------------------------------------------------------------------- > 114640 246.602 | 10.5254966 | 0.0149352 0.9995490 | > 3748 5289 | 0 0 > 114644 246.611 | 10.5255173 | 0.0144565 0.9995490 | > 3748 5289 | 0 0 > > The third column is the speed of the simulation in coordinate time per > hour (it is a truncation of "physical_time_per_hour"). > > It's possible that the OpenMP or MPI configuration is not correct. Please > could you post the standard output file (GW150914_*.out) to > https://pastebin.com so we can take a look at it? > > We believe that GW150914.rpar EinsteinToolKit is a great use case to test > OpenShift for HPC, and of course we will reference to EinsteinToolKit is > our final report as a use case for Openshift in HPC mode. > > > Great; it sounds interesting! There are instructions for the papers which > should be cited if you use this parameter file and code at the top of the > parameter file: > > # Copyright Barry Wardell, Ian Hinder, Eloisa Bentivegna > > # We ask that if you make use of the parameter file or the example > # data, then please cite > > # Simulation of GW150914 binary black hole merger using the > # Einstein Toolkit - https://doi.org/10.5281/zenodo.155394 > > # as well as the Einstein Toolkit, the Llama multi-block > # infrastructure, the Carpet mesh-refinement driver, the apparent > # horizon finder AHFinderDirect, the TwoPunctures initial data code, > # QuasiLocalMeasures, Cactus, and the McLachlan spacetime evolution > # code, the Kranc code generation package, and the Simulation Factory. > > # An appropriate bibtex file, etgw150914.bib, is provided with this > # parameter file. > > and the bibtex file is at > https://einsteintoolkit.org/gallery/bbh/etgw150914.bib. > > -- > Ian Hinder > https://ianhinder.net > > -- BenjamÃn Chardà Marco Senior Red Hat Consultant RHCE #100-107-341 [email protected] Mobile: 0034 654 344 878
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
