Hi Dan, On 26/08/2016 08:53, Dan Ibanez wrote: > I apologize in advance if this is the wrong forum > for this question, but I recently read that PyFR > is a Gordon Bell finalist for having run on the Titan machine. > I also recall a year or so ago hearing about significant > difficulties in running Python at large scale on supercomputers. > The specific issue I recall has to do with Python's > module import doing filesystem searches from all > compute nodes at once, stressing the filesystem. > Have issues like this been resolved in a general sense, > or via workarounds ? > What is the current outlook for others to develop Python code > that can scale on leadership-class supercomputers ? > Anyway, congratulations and thank you in advance for > any clarifications you can provide.
The problems often encountered when attempting to use Python on large computing clusters stem from the fact that parallel file systems are ill-equipped to deal with the large number of metadata operations that are generated by both the Python interpreter and the dynamic loader. These issues are perennial and are unlikely to be resolved in the near term; handling millions of metadata IOPS is an extremely challenging problem. Nevertheless, there are range of means by which these issues can be mitigated. The first is to ensure that as much of the stack (Python + modules + application) as possible is pushed out to the nodes themselves. One means of accomplishing this is through seeing that the relevant dependencies are made available as modules. Further, some systems provide means through which a user can arrange to have application data pushed directly onto the local disks of the compute nodes. Doing this ensures that much of the start-up related I/O is purely local; and hence scalable. For portions of the stack where this is either not possible or impractical (say because the application code is undergoing constant revision and hence the overheads associated with getting it built as a module are substantial) the next best thing is to pack all remaining portions into a compressed archive. The job script is then instructed to copy this archive from the parallel file system onto a region of scratch space on each compute node. This can be /tmp or something more fancy. The archive can then be extracted and the application run from there. Here the majority of the I/O operations are to read a reasonably sized file—something that is much better suited to the wants of a parallel file system. With this approach care must be taken if there are to be multiple jobs per node; however there are certainly ways of managing this. A combination of these two approaches allows a Python application to scale nearly indefinitely without overloading the metadata server. Crucially, neither require LD_PRELOAD type hacks or any kind of special software; they can be deployed directly on the majority of clusters which we have encountered. Regards, Freddie. -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send an email to [email protected]. Visit this group at https://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout.
signature.asc
Description: OpenPGP digital signature
