Hello Lorenzo, If the issue is really "use MPI-IO" and not "use *parallel* file access" then you can give the branch "rhaas/mpiio" of Carpet a try.
It introduces a new parameter CarpetIOHDF5::user_MPIIO that makes Carpet instruct HDF5 to use MPIIO instead of just POSIX/Unix file IO calls. This is *not* parallel (ie same number of output files, each process acts independently) but it would bring in eg any awareness that MPI-IO might have about Lustre that POSIX IO does not have. This technically would address the issue that Frontera support seems to have brought up, though I feel that is does not really address anything (ie no parallel IO at all). Yours, Roland > Hello Roland, all, > could it be possible for you to have a quick look at the parameter file I > am using (attached) to check if there is anything manifestly > wrong/unsafe/unrecommended with checkpointing or with other I/O options? In > case there are any issues, I can then take care of them and report back to > the Frontera people. > > Thank you very much in advance, > Lorenzo > > Il giorno gio 6 ott 2022 alle ore 15:29 Roland Haas <rh...@illinois.edu> ha > scritto: > > > Hello Lorenzo, > > > > TACC saved a bit of money on the IO system on Frontera :-) and thus > > they now need to fix bugs in documentation. > > > > Yours, > > Roland > > > > > Hi Roland, > > > thank you, your suggestions are very useful. I was running one process > > per > > > core on more than 200 cores, so that may be part of the issue. Also, I > > will > > > try the one_file_per_group or one_file_per_rank options to reduce the > > > performance impact. > > > > > > The cluster I'm running on is Frontera, and the guidelines to manage I/O > > > operations properly on it are here > > > < > > https://urldefense.com/v3/__https://portal.tacc.utexas.edu/tutorials/managingio__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd05VwT-aw$ > > > > > in case people are > > > interested. I will follow them as closely as I can to avoid similar > > > problems in the future. > > > > > > Thank you very much again, > > > Lorenzo > > > > > > Il giorno gio 6 ott 2022 alle ore 12:52 Roland Haas <rh...@illinois.edu> > > ha > > > scritto: > > > > > > > Hello Lorenzo, > > > > > > > > Unfortunately, Carpet will always write one checkpoint file per MPI > > > > rank, there is no way to change that. > > > > > > > > As you learned the option out_proc_every only affects the out3D_vars > > > > output (and possible out_vars 3D output) but never checkpoints. > > > > > > > > In my opinion, you should be impossible to stress the file system, of a > > > > reasonably provisioned cluster, with the checkpoints. Even when running > > > > on 32k MPI ranks (and 4k nodes) on BW, checkpoint-recovery was very > > > > quick (1min or so) and barely made a blip on the system monitoring > > > > radar. Any cluster with sufficiently many nodes to run at scale at > > > > 1 file per rank (for a sane number of ranks ie some OpenMP threads) > > > > should have a file system capable of taking checkpoints. Of course 1 > > > > rank per core is no longer "sane" once you go beyond a couple hundred > > > > cores. > > > > > > > > Now writing 1 file per output variable and per MPI rank may be a > > > > different thing.... > > > > In that case out_proc_every should help with out3D_vars. I would also > > > > suggest one_file_per_group or even one_file_per_rank for this (see > > > > CarpetIOHDF5's param.ccl), which will have less of a performance (no > > > > communication) impact than out_proc_every != 1. > > > > > > > > If the issue is opening many files (again, only for out3D_vars regular > > > > output), then you may also see benefits from the different options in: > > > > > > > > > > https://urldefense.com/v3/__https://bitbucket.org/eschnett/carpet/pull-requests/34__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd1fZiBWGw$ > > > > > > > > > > > > https://urldefense.com/v3/__https://bitbucket.org/einsteintoolkit/tickets/issues/2364__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd0HNoNnvA$ > > > > > > > > > > Yours, > > > > Roland > > > > > > > > > Hello, > > > > > In order to avoid stressing the filesystem on the cluster I'm > > running > > > > on, I > > > > > was suggested to avoid writing one output/checkpoint file per MPI > > process > > > > > and instead collecting data from multiple processes before > > > > > outputting/checkpointing happens. I found the combination of > > parameters > > > > > > > > > > IO::out_mode = "np" > > > > > IO::out_proc_every = 8 > > > > > > > > > > does the job for output files, but I still have one checkpoint file > > per > > > > > process. Is there a similar parameter, or combination of > > parameters, > > > > which > > > > > can be used for checkpoint files? > > > > > > > > > > Thank you very much, > > > > > Lorenzo Ennoggi > > > > > > > > > > > > > > > > -- > > > > My email is as private as my paper mail. I therefore support encrypting > > > > and signing email messages. Get my PGP key from > > https://urldefense.com/v3/__http://keys.gnupg.net__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd1xKACaIQ$ > > . > > > > > > > > > > > > -- > > My email is as private as my paper mail. I therefore support encrypting > > and signing email messages. Get my PGP key from > > https://urldefense.com/v3/__http://keys.gnupg.net__;!!DZ3fjg!6-OWh_9-disVkk9WVMu-RD4jPB4ybPxx1Uq6gfq_DZyyONvkE0qnWh098RAXAEpLovR6-0fbMsSbFor4u04IgN7ZBQ$ > > . > > -- My email is as private as my paper mail. I therefore support encrypting and signing email messages. Get my PGP key from http://keys.gnupg.net.
pgpYpiv_rcrGn.pgp
Description: OpenPGP digital signature
_______________________________________________ Users mailing list Users@einsteintoolkit.org http://lists.einsteintoolkit.org/mailman/listinfo/users