Re: [gmx-users] strange GPU load distribution
Hi Szilárd, It really does appear that GMX_DISABLE_GPU_DETECTION=1 in the user's .bashrc fixed it right up. We haven't tried his runs alongside GPU-accelerated jobs yet, but he reports that none of his PIDs ever appear in nvidia-smi anymore and overall his jobs start much faster. This was an excellent suggestion, thank you. Alex On 5/7/2018 2:54 PM, Szilárd Páll wrote: Hi, You have at least one option more elegant than using a separate binary for EM. Set GMX_DISABLE_GPU_DETECTION=1 environment variable which is the internal GROMACS override that forces the detection off for cases similar to yours. That should solve the detection latency. If for some reason it does not, you can always set CUDA_VISIBLE_DEVICES="" so jobs simply do not "see" any GPUs. This is a standard environment variable for the CUDA runtime. Let us know if that worked. Cheers. -- Szilárd On Mon, May 7, 2018 at 9:38 AM, Alexwrote: Thanks Mark. No need to be sorry, a CPU-only build is a simple enough fix. Inelegant, but if it works, it's all good. I'll report as soon as we have tried. I myself run things in a way that you would find very familiar, but we have a colleague developing forcefields and that involves tons of very short CPU-only runs getting submitted in bursts. Hopefully, one day you'll be able to accommodate this scenario. :) Alex On 5/7/2018 1:13 AM, Mark Abraham wrote: Hi, I don't see any problems there, but I note that there are run-time settings for the driver/runtime to block until no other process is using the GPU, which may be a contributing factor here. As Justin noted, if your EM jobs would use a build of GROMACS that is not configured to have access to the GPUs, then there can be no problem. I recommend you do that if you want to continue sharing this node between GPU and non-GPU jobs. There has long been the principle that users must take active steps to keep GROMACS processes away from each other when sharing CPU resources, and this is a similar situation. In the abstract, it would be reasonable to organize mdrun so that we determine that we might want to use a GPU if we have one before we run the GPU detection, however that high-level code is in considerable flux in development branches, and we are highly unlikely to prioritise such a fix in a stable release branch to suit this use case. I didn't think that some of the reorganization since 2016 release would have this effect, but apparently it can. Sorry! Mark On Mon, May 7, 2018 at 6:33 AM Alex wrote: Mark, I am forwarding the response I received from the colleague who prepared the box for my GMX install -- this is from the latest installation of 2018.1. See text below and please let me know what you think. We have no problem rebuilding things, but would like to understand what is wrong before we pause all the work. Thank you, Alex "OS Ubuntu 16.04LTS After checking gcc and kernel-headers were installed I ran the following #sudo lspci |grep -i nvidia #curl http://developer.download.nvidia.com/compute/cuda/repos/ubun tu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > cuda-repo-ubuntu1604_9.1.85-1_amd64.deb #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb #sudo apt-key adv --fetch-keyshttp:// developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/ x86_64/7fa2af80.pub #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb #sudo apt update #sudo apt upgrade #sudo /sbin/shutdown -r now After reboot #sudo apt-get install cuda #export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}} #nvidia-smi I also compiled the samples in the cuda tree using the Makefile there and had no problems." -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support /Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] strange GPU load distribution
I think we have everything ready at this point: a separate binary (not sourced yet), and these options. We've set GMX_DISABLE_GPU_DETECTION=1 in the user's .bashrc and will try the other option, if this one fails. Will update here on the bogging down situation. Thanks a lot. Alex On Mon, May 7, 2018 at 2:54 PM, Szilárd Pállwrote: > Hi, > > You have at least one option more elegant than using a separate binary for > EM. > > Set GMX_DISABLE_GPU_DETECTION=1 environment variable which is the internal > GROMACS override that forces the detection off for cases similar to > yours. That should solve the detection latency. If for some reason it does > not, you can always set CUDA_VISIBLE_DEVICES="" so jobs simply do not "see" > any GPUs. This is a standard environment variable for the CUDA runtime. > > Let us know if that worked. > > Cheers. > > -- > Szilárd > > On Mon, May 7, 2018 at 9:38 AM, Alex wrote: > > > Thanks Mark. No need to be sorry, a CPU-only build is a simple enough > fix. > > Inelegant, but if it works, it's all good. I'll report as soon as we have > > tried. > > > > I myself run things in a way that you would find very familiar, but we > > have a colleague developing forcefields and that involves tons of very > > short CPU-only runs getting submitted in bursts. Hopefully, one day > you'll > > be able to accommodate this scenario. :) > > > > Alex > > > > > > > > On 5/7/2018 1:13 AM, Mark Abraham wrote: > > > >> Hi, > >> > >> I don't see any problems there, but I note that there are run-time > >> settings > >> for the driver/runtime to block until no other process is using the GPU, > >> which may be a contributing factor here. > >> > >> As Justin noted, if your EM jobs would use a build of GROMACS that is > not > >> configured to have access to the GPUs, then there can be no problem. I > >> recommend you do that if you want to continue sharing this node between > >> GPU > >> and non-GPU jobs. There has long been the principle that users must take > >> active steps to keep GROMACS processes away from each other when sharing > >> CPU resources, and this is a similar situation. > >> > >> In the abstract, it would be reasonable to organize mdrun so that we > >> determine that we might want to use a GPU if we have one before we run > the > >> GPU detection, however that high-level code is in considerable flux in > >> development branches, and we are highly unlikely to prioritise such a > fix > >> in a stable release branch to suit this use case. I didn't think that > some > >> of the reorganization since 2016 release would have this effect, but > >> apparently it can. Sorry! > >> > >> Mark > >> > >> On Mon, May 7, 2018 at 6:33 AM Alex wrote: > >> > >> Mark, > >>> > >>> I am forwarding the response I received from the colleague who prepared > >>> the box for my GMX install -- this is from the latest installation of > >>> 2018.1. See text below and please let me know what you think. We have > no > >>> problem rebuilding things, but would like to understand what is wrong > >>> before we pause all the work. > >>> > >>> Thank you, > >>> > >>> Alex > >>> > >>> "OS Ubuntu 16.04LTS > >>> > >>> After checking gcc and kernel-headers were installed I ran the > following > >>> > >>> #sudo lspci |grep -i nvidia > >>> > >>> #curl > >>> > >>> http://developer.download.nvidia.com/compute/cuda/repos/ubun > >>> tu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > >>> > cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > >>> > >>> #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > >>> > >>> #sudo apt-key adv > >>> --fetch-keyshttp:// > >>> developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/ > >>> x86_64/7fa2af80.pub > >>> > >>> #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > >>> > >>> #sudo apt update > >>> > >>> #sudo apt upgrade > >>> > >>> #sudo /sbin/shutdown -r now > >>> > >>> After reboot > >>> > >>> #sudo apt-get install cuda > >>> > >>> #export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}} > >>> > >>> #nvidia-smi > >>> > >>> I also compiled the samples in the cuda tree using the Makefile there > >>> and had no problems." > >>> > >>> -- > >>> Gromacs Users mailing list > >>> > >>> * Please search the archive at > >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > >>> posting! > >>> > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > >>> > >>> * For (un)subscribe requests visit > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > >>> send a mail to gmx-users-requ...@gromacs.org. > >>> > >>> > > -- > > Gromacs Users mailing list > > > > * Please search the archive at http://www.gromacs.org/Support > > /Mailing_Lists/GMX-Users_List before posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to
Re: [gmx-users] strange GPU load distribution
Hi, You have at least one option more elegant than using a separate binary for EM. Set GMX_DISABLE_GPU_DETECTION=1 environment variable which is the internal GROMACS override that forces the detection off for cases similar to yours. That should solve the detection latency. If for some reason it does not, you can always set CUDA_VISIBLE_DEVICES="" so jobs simply do not "see" any GPUs. This is a standard environment variable for the CUDA runtime. Let us know if that worked. Cheers. -- Szilárd On Mon, May 7, 2018 at 9:38 AM, Alexwrote: > Thanks Mark. No need to be sorry, a CPU-only build is a simple enough fix. > Inelegant, but if it works, it's all good. I'll report as soon as we have > tried. > > I myself run things in a way that you would find very familiar, but we > have a colleague developing forcefields and that involves tons of very > short CPU-only runs getting submitted in bursts. Hopefully, one day you'll > be able to accommodate this scenario. :) > > Alex > > > > On 5/7/2018 1:13 AM, Mark Abraham wrote: > >> Hi, >> >> I don't see any problems there, but I note that there are run-time >> settings >> for the driver/runtime to block until no other process is using the GPU, >> which may be a contributing factor here. >> >> As Justin noted, if your EM jobs would use a build of GROMACS that is not >> configured to have access to the GPUs, then there can be no problem. I >> recommend you do that if you want to continue sharing this node between >> GPU >> and non-GPU jobs. There has long been the principle that users must take >> active steps to keep GROMACS processes away from each other when sharing >> CPU resources, and this is a similar situation. >> >> In the abstract, it would be reasonable to organize mdrun so that we >> determine that we might want to use a GPU if we have one before we run the >> GPU detection, however that high-level code is in considerable flux in >> development branches, and we are highly unlikely to prioritise such a fix >> in a stable release branch to suit this use case. I didn't think that some >> of the reorganization since 2016 release would have this effect, but >> apparently it can. Sorry! >> >> Mark >> >> On Mon, May 7, 2018 at 6:33 AM Alex wrote: >> >> Mark, >>> >>> I am forwarding the response I received from the colleague who prepared >>> the box for my GMX install -- this is from the latest installation of >>> 2018.1. See text below and please let me know what you think. We have no >>> problem rebuilding things, but would like to understand what is wrong >>> before we pause all the work. >>> >>> Thank you, >>> >>> Alex >>> >>> "OS Ubuntu 16.04LTS >>> >>> After checking gcc and kernel-headers were installed I ran the following >>> >>> #sudo lspci |grep -i nvidia >>> >>> #curl >>> >>> http://developer.download.nvidia.com/compute/cuda/repos/ubun >>> tu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb >>> > cuda-repo-ubuntu1604_9.1.85-1_amd64.deb >>> >>> #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb >>> >>> #sudo apt-key adv >>> --fetch-keyshttp:// >>> developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/ >>> x86_64/7fa2af80.pub >>> >>> #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb >>> >>> #sudo apt update >>> >>> #sudo apt upgrade >>> >>> #sudo /sbin/shutdown -r now >>> >>> After reboot >>> >>> #sudo apt-get install cuda >>> >>> #export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}} >>> >>> #nvidia-smi >>> >>> I also compiled the samples in the cuda tree using the Makefile there >>> and had no problems." >>> >>> -- >>> Gromacs Users mailing list >>> >>> * Please search the archive at >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >>> posting! >>> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >>> * For (un)subscribe requests visit >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>> send a mail to gmx-users-requ...@gromacs.org. >>> >>> > -- > Gromacs Users mailing list > > * Please search the archive at http://www.gromacs.org/Support > /Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] strange GPU load distribution
Thanks Mark. No need to be sorry, a CPU-only build is a simple enough fix. Inelegant, but if it works, it's all good. I'll report as soon as we have tried. I myself run things in a way that you would find very familiar, but we have a colleague developing forcefields and that involves tons of very short CPU-only runs getting submitted in bursts. Hopefully, one day you'll be able to accommodate this scenario. :) Alex On 5/7/2018 1:13 AM, Mark Abraham wrote: Hi, I don't see any problems there, but I note that there are run-time settings for the driver/runtime to block until no other process is using the GPU, which may be a contributing factor here. As Justin noted, if your EM jobs would use a build of GROMACS that is not configured to have access to the GPUs, then there can be no problem. I recommend you do that if you want to continue sharing this node between GPU and non-GPU jobs. There has long been the principle that users must take active steps to keep GROMACS processes away from each other when sharing CPU resources, and this is a similar situation. In the abstract, it would be reasonable to organize mdrun so that we determine that we might want to use a GPU if we have one before we run the GPU detection, however that high-level code is in considerable flux in development branches, and we are highly unlikely to prioritise such a fix in a stable release branch to suit this use case. I didn't think that some of the reorganization since 2016 release would have this effect, but apparently it can. Sorry! Mark On Mon, May 7, 2018 at 6:33 AM Alexwrote: Mark, I am forwarding the response I received from the colleague who prepared the box for my GMX install -- this is from the latest installation of 2018.1. See text below and please let me know what you think. We have no problem rebuilding things, but would like to understand what is wrong before we pause all the work. Thank you, Alex "OS Ubuntu 16.04LTS After checking gcc and kernel-headers were installed I ran the following #sudo lspci |grep -i nvidia #curl http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > cuda-repo-ubuntu1604_9.1.85-1_amd64.deb #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb #sudo apt-key adv --fetch-keyshttp:// developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb #sudo apt update #sudo apt upgrade #sudo /sbin/shutdown -r now After reboot #sudo apt-get install cuda #export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}} #nvidia-smi I also compiled the samples in the cuda tree using the Makefile there and had no problems." -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] strange GPU load distribution
Hi, I don't see any problems there, but I note that there are run-time settings for the driver/runtime to block until no other process is using the GPU, which may be a contributing factor here. As Justin noted, if your EM jobs would use a build of GROMACS that is not configured to have access to the GPUs, then there can be no problem. I recommend you do that if you want to continue sharing this node between GPU and non-GPU jobs. There has long been the principle that users must take active steps to keep GROMACS processes away from each other when sharing CPU resources, and this is a similar situation. In the abstract, it would be reasonable to organize mdrun so that we determine that we might want to use a GPU if we have one before we run the GPU detection, however that high-level code is in considerable flux in development branches, and we are highly unlikely to prioritise such a fix in a stable release branch to suit this use case. I didn't think that some of the reorganization since 2016 release would have this effect, but apparently it can. Sorry! Mark On Mon, May 7, 2018 at 6:33 AM Alexwrote: > Mark, > > I am forwarding the response I received from the colleague who prepared > the box for my GMX install -- this is from the latest installation of > 2018.1. See text below and please let me know what you think. We have no > problem rebuilding things, but would like to understand what is wrong > before we pause all the work. > > Thank you, > > Alex > > "OS Ubuntu 16.04LTS > > After checking gcc and kernel-headers were installed I ran the following > > #sudo lspci |grep -i nvidia > > #curl > > http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > > cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > > #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > > #sudo apt-key adv > --fetch-keyshttp:// > developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub > > #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > > #sudo apt update > > #sudo apt upgrade > > #sudo /sbin/shutdown -r now > > After reboot > > #sudo apt-get install cuda > > #export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}} > > #nvidia-smi > > I also compiled the samples in the cuda tree using the Makefile there > and had no problems." > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] strange GPU load distribution
Mark, I am forwarding the response I received from the colleague who prepared the box for my GMX install -- this is from the latest installation of 2018.1. See text below and please let me know what you think. We have no problem rebuilding things, but would like to understand what is wrong before we pause all the work. Thank you, Alex "OS Ubuntu 16.04LTS After checking gcc and kernel-headers were installed I ran the following #sudo lspci |grep -i nvidia #curl http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb > cuda-repo-ubuntu1604_9.1.85-1_amd64.deb #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb #sudo apt-key adv --fetch-keyshttp://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb #sudo apt update #sudo apt upgrade #sudo /sbin/shutdown -r now After reboot #sudo apt-get install cuda #export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}} #nvidia-smi I also compiled the samples in the cuda tree using the Makefile there and had no problems." -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] strange GPU load distribution
Hi Mark, I forwarded your email to the person who installed CUDA on our boxes. Just to be clear, there is no persistent occupancy of the GPUs _after_ the process has finished. The observation is as follows: EM jobs submitted > low CPU use by the EM jobs, GPUs bogged down, no output files yet > GPUs released, normal CPU use, output/log files appear, normal completion. I will update as soon as I know more. We seem to have run into a very unpleasant combination (type of jobs submitted + slow GPU init). I recall that when I first reported the issue with the regression test, Szilard suggested that for whatever reason all of our GPU initializations will take longer. We would not have noticed, but we have a user who does a lot of these initializations now. Thanks, Alex On 5/6/2018 5:13 PM, Mark Abraham wrote: Hi, In 2018 and 2018.1, mdrun does indeed run GPU detection and compatibility checks before any logic about whether it should use any GPUs that were in fact detected. However, there's nothing about those checks that should a) take any noticeable time, b) acquire any ongoing resources, or c) lead to persistent occupancy of the GPUs after the simulation process completes. Those combined observations point to something about the installation of the GPUs / runtime / SDK / drivers. What distro are you using? Is there maybe some kind of security feature enabled that could be interfering? Are the GPUs configured to use some kind of process-exclusive mode? Mark On Mon, May 7, 2018 at 12:14 AM Justin Lemkulwrote: On 5/6/18 6:11 PM, Alex wrote: A separate CPU-only build is what we were going to try, but if it succeeds with not touching GPUs, then what -- keep several builds? If your CPU-only run produces something that doesn't touch the GPU (which it shouldn't), that test would rather conclusively state the if the user requests a CPU-only run, then the mdrun code needs to be patched in such a way that GPU detection is not carried out. If that's the case, yes, you'd have to wait for a patch and in the meantime maintain two different mdrun binaries, but it would be a valuable bit of information for the dev team. That latency you mention is definitely there, I think it is related to my earlier report of one of the regression tests failing (I think Mark might remember that one). That failure, by the way, is persistent with 2018.1 we just installed on a completely different machine. I seemed to recall that, which is what got me thinking. -Justin Alex On 5/6/2018 4:03 PM, Justin Lemkul wrote: On 5/6/18 5:51 PM, Alex wrote: Unfortunately, we're still bogged down when the EM runs (example below) start -- CPU usage by these jobs is initially low, while their PIDs show up in nvidia-smi. After about a minute all goes back to normal. Because the user is doing it frequently (scripted), everything is slowed down by a large factor. Interestingly, we have another user utilizing a GPU with another MD package (LAMMPS) and that GPU is never touched by these EM jobs. Any ideas will be greatly appreciated. Thinking out loud - a run that explicitly calls for only the CPU to be used might be trying to detect GPU if mdrun is GPU-enabled. Is that a possibility, including any latency in detecting that device? Have you tested to make sure that an mdrun binary that is explicitly disabled from using GPU (-DGMX_GPU=OFF) doesn't affect the GPU usage when running the same command? -Justin Thanks, Alex PID TTY STAT TIME COMMAND 60432 pts/8Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb cpu -pme cpu -deffnm em_steep On 4/27/2018 2:16 PM, Mark Abraham wrote: Hi, What you think was run isn't nearly as useful when troubleshooting as asking the kernel what is actually running. Mark On Fri, Apr 27, 2018, 21:59 Alex wrote: Mark, I copied the exact command line from the script, right above the mdp file. It is literally how the script calls mdrun in this case: gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm On 4/27/2018 1:52 PM, Mark Abraham wrote: Group cutoff scheme can never run on a gpu, so none of that should matter. Use ps and find out what the command lines were. Mark On Fri, Apr 27, 2018, 21:37 Alex wrote: Update: we're basically removing commands one by one from the script that submits the jobs causing the issue. The culprit is both EM and the MD run: and GPUs are being affected _before_ MD starts loading the CPU, i.e. this is the initial setting up of the EM run -- CPU load is near zero, nvidia-smi reports the mess. I wonder if this is in any way related to that timing test we were failing a while back. mdrun call and mdp below, though I suspect they have nothing to do with what is happening. Any help will be very highly appreciated. Alex *** gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm mdp: ; Run control integrator = md-vv ; Velocity Verlet tinit= 0 dt
Re: [gmx-users] strange GPU load distribution
Hi, In 2018 and 2018.1, mdrun does indeed run GPU detection and compatibility checks before any logic about whether it should use any GPUs that were in fact detected. However, there's nothing about those checks that should a) take any noticeable time, b) acquire any ongoing resources, or c) lead to persistent occupancy of the GPUs after the simulation process completes. Those combined observations point to something about the installation of the GPUs / runtime / SDK / drivers. What distro are you using? Is there maybe some kind of security feature enabled that could be interfering? Are the GPUs configured to use some kind of process-exclusive mode? Mark On Mon, May 7, 2018 at 12:14 AM Justin Lemkulwrote: > > > On 5/6/18 6:11 PM, Alex wrote: > > A separate CPU-only build is what we were going to try, but if it > > succeeds with not touching GPUs, then what -- keep several builds? > > > > If your CPU-only run produces something that doesn't touch the GPU > (which it shouldn't), that test would rather conclusively state the if > the user requests a CPU-only run, then the mdrun code needs to be > patched in such a way that GPU detection is not carried out. If that's > the case, yes, you'd have to wait for a patch and in the meantime > maintain two different mdrun binaries, but it would be a valuable bit of > information for the dev team. > > > That latency you mention is definitely there, I think it is related to > > my earlier report of one of the regression tests failing (I think Mark > > might remember that one). That failure, by the way, is persistent with > > 2018.1 we just installed on a completely different machine. > > I seemed to recall that, which is what got me thinking. > > -Justin > > > > > Alex > > > > > > On 5/6/2018 4:03 PM, Justin Lemkul wrote: > >> > >> > >> On 5/6/18 5:51 PM, Alex wrote: > >>> Unfortunately, we're still bogged down when the EM runs (example > >>> below) start -- CPU usage by these jobs is initially low, while > >>> their PIDs show up in nvidia-smi. After about a minute all goes back > >>> to normal. Because the user is doing it frequently (scripted), > >>> everything is slowed down by a large factor. Interestingly, we have > >>> another user utilizing a GPU with another MD package (LAMMPS) and > >>> that GPU is never touched by these EM jobs. > >>> > >>> Any ideas will be greatly appreciated. > >>> > >> > >> Thinking out loud - a run that explicitly calls for only the CPU to > >> be used might be trying to detect GPU if mdrun is GPU-enabled. Is > >> that a possibility, including any latency in detecting that device? > >> Have you tested to make sure that an mdrun binary that is explicitly > >> disabled from using GPU (-DGMX_GPU=OFF) doesn't affect the GPU usage > >> when running the same command? > >> > >> -Justin > >> > >>> Thanks, > >>> > >>> Alex > >>> > >>> > PID TTY STAT TIME COMMAND > > 60432 pts/8Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 > -nb cpu -pme cpu -deffnm em_steep > > > > >>> > >>> > On 4/27/2018 2:16 PM, Mark Abraham wrote: > > Hi, > > > > What you think was run isn't nearly as useful when troubleshooting as > > asking the kernel what is actually running. > > > > Mark > > > > > > On Fri, Apr 27, 2018, 21:59 Alex wrote: > > > >> Mark, I copied the exact command line from the script, right > >> above the > >> mdp file. It is literally how the script calls mdrun in this case: > >> > >> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm > >> > >> > >> On 4/27/2018 1:52 PM, Mark Abraham wrote: > >>> Group cutoff scheme can never run on a gpu, so none of that should > >> matter. > >>> Use ps and find out what the command lines were. > >>> > >>> Mark > >>> > >>> On Fri, Apr 27, 2018, 21:37 Alex wrote: > >>> > Update: we're basically removing commands one by one from the > script > >> that > submits the jobs causing the issue. The culprit is both EM and > the MD > >> run: > and GPUs are being affected _before_ MD starts loading the CPU, > i.e. > >> this > is the initial setting up of the EM run -- CPU load is near zero, > nvidia-smi reports the mess. I wonder if this is in any way > related to > >> that > timing test we were failing a while back. > mdrun call and mdp below, though I suspect they have nothing to > do with > what is happening. Any help will be very highly appreciated. > > Alex > > *** > > gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm > > mdp: > > ; Run control > integrator = md-vv ; Velocity Verlet > tinit= 0 > dt = 0.002 >
Re: [gmx-users] strange GPU load distribution
On 5/6/18 6:11 PM, Alex wrote: A separate CPU-only build is what we were going to try, but if it succeeds with not touching GPUs, then what -- keep several builds? If your CPU-only run produces something that doesn't touch the GPU (which it shouldn't), that test would rather conclusively state the if the user requests a CPU-only run, then the mdrun code needs to be patched in such a way that GPU detection is not carried out. If that's the case, yes, you'd have to wait for a patch and in the meantime maintain two different mdrun binaries, but it would be a valuable bit of information for the dev team. That latency you mention is definitely there, I think it is related to my earlier report of one of the regression tests failing (I think Mark might remember that one). That failure, by the way, is persistent with 2018.1 we just installed on a completely different machine. I seemed to recall that, which is what got me thinking. -Justin Alex On 5/6/2018 4:03 PM, Justin Lemkul wrote: On 5/6/18 5:51 PM, Alex wrote: Unfortunately, we're still bogged down when the EM runs (example below) start -- CPU usage by these jobs is initially low, while their PIDs show up in nvidia-smi. After about a minute all goes back to normal. Because the user is doing it frequently (scripted), everything is slowed down by a large factor. Interestingly, we have another user utilizing a GPU with another MD package (LAMMPS) and that GPU is never touched by these EM jobs. Any ideas will be greatly appreciated. Thinking out loud - a run that explicitly calls for only the CPU to be used might be trying to detect GPU if mdrun is GPU-enabled. Is that a possibility, including any latency in detecting that device? Have you tested to make sure that an mdrun binary that is explicitly disabled from using GPU (-DGMX_GPU=OFF) doesn't affect the GPU usage when running the same command? -Justin Thanks, Alex PID TTY STAT TIME COMMAND 60432 pts/8 Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb cpu -pme cpu -deffnm em_steep On 4/27/2018 2:16 PM, Mark Abraham wrote: Hi, What you think was run isn't nearly as useful when troubleshooting as asking the kernel what is actually running. Mark On Fri, Apr 27, 2018, 21:59 Alexwrote: Mark, I copied the exact command line from the script, right above the mdp file. It is literally how the script calls mdrun in this case: gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm On 4/27/2018 1:52 PM, Mark Abraham wrote: Group cutoff scheme can never run on a gpu, so none of that should matter. Use ps and find out what the command lines were. Mark On Fri, Apr 27, 2018, 21:37 Alex wrote: Update: we're basically removing commands one by one from the script that submits the jobs causing the issue. The culprit is both EM and the MD run: and GPUs are being affected _before_ MD starts loading the CPU, i.e. this is the initial setting up of the EM run -- CPU load is near zero, nvidia-smi reports the mess. I wonder if this is in any way related to that timing test we were failing a while back. mdrun call and mdp below, though I suspect they have nothing to do with what is happening. Any help will be very highly appreciated. Alex *** gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm mdp: ; Run control integrator = md-vv ; Velocity Verlet tinit = 0 dt = 0.002 nsteps = 50 ; 1 ns nstcomm = 100 ; Output control nstxout = 5 nstvout = 5 nstfout = 0 nstlog = 5 nstenergy = 5 nstxout-compressed = 0 ; Neighborsearching and short-range nonbonded interactions cutoff-scheme = group nstlist = 10 ns_type = grid pbc = xyz rlist = 1.4 ; Electrostatics coulombtype = cutoff rcoulomb = 1.4 ; van der Waals vdwtype = user vdw-modifier = none rvdw = 1.4 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order = 6 ewald_rtol = 1e-06 epsilon_surface = 0 ; Temperature coupling Tcoupl = nose-hoover tc_grps = system tau_t = 1.0 ref_t = some_temperature ; Pressure coupling is off for NVT Pcoupl = No tau_p = 0.5 compressibility = 4.5e-05 ref_p = 1.0 ; options for bonds constraints = all-bonds constraint_algorithm = lincs On Fri, Apr 27, 2018 at 1:14 PM, Alex wrote: As I said, only two users, and
Re: [gmx-users] strange GPU load distribution
A separate CPU-only build is what we were going to try, but if it succeeds with not touching GPUs, then what -- keep several builds? That latency you mention is definitely there, I think it is related to my earlier report of one of the regression tests failing (I think Mark might remember that one). That failure, by the way, is persistent with 2018.1 we just installed on a completely different machine. Alex On 5/6/2018 4:03 PM, Justin Lemkul wrote: On 5/6/18 5:51 PM, Alex wrote: Unfortunately, we're still bogged down when the EM runs (example below) start -- CPU usage by these jobs is initially low, while their PIDs show up in nvidia-smi. After about a minute all goes back to normal. Because the user is doing it frequently (scripted), everything is slowed down by a large factor. Interestingly, we have another user utilizing a GPU with another MD package (LAMMPS) and that GPU is never touched by these EM jobs. Any ideas will be greatly appreciated. Thinking out loud - a run that explicitly calls for only the CPU to be used might be trying to detect GPU if mdrun is GPU-enabled. Is that a possibility, including any latency in detecting that device? Have you tested to make sure that an mdrun binary that is explicitly disabled from using GPU (-DGMX_GPU=OFF) doesn't affect the GPU usage when running the same command? -Justin Thanks, Alex PID TTY STAT TIME COMMAND 60432 pts/8 Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb cpu -pme cpu -deffnm em_steep On 4/27/2018 2:16 PM, Mark Abraham wrote: Hi, What you think was run isn't nearly as useful when troubleshooting as asking the kernel what is actually running. Mark On Fri, Apr 27, 2018, 21:59 Alexwrote: Mark, I copied the exact command line from the script, right above the mdp file. It is literally how the script calls mdrun in this case: gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm On 4/27/2018 1:52 PM, Mark Abraham wrote: Group cutoff scheme can never run on a gpu, so none of that should matter. Use ps and find out what the command lines were. Mark On Fri, Apr 27, 2018, 21:37 Alex wrote: Update: we're basically removing commands one by one from the script that submits the jobs causing the issue. The culprit is both EM and the MD run: and GPUs are being affected _before_ MD starts loading the CPU, i.e. this is the initial setting up of the EM run -- CPU load is near zero, nvidia-smi reports the mess. I wonder if this is in any way related to that timing test we were failing a while back. mdrun call and mdp below, though I suspect they have nothing to do with what is happening. Any help will be very highly appreciated. Alex *** gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm mdp: ; Run control integrator = md-vv ; Velocity Verlet tinit = 0 dt = 0.002 nsteps = 50 ; 1 ns nstcomm = 100 ; Output control nstxout = 5 nstvout = 5 nstfout = 0 nstlog = 5 nstenergy = 5 nstxout-compressed = 0 ; Neighborsearching and short-range nonbonded interactions cutoff-scheme = group nstlist = 10 ns_type = grid pbc = xyz rlist = 1.4 ; Electrostatics coulombtype = cutoff rcoulomb = 1.4 ; van der Waals vdwtype = user vdw-modifier = none rvdw = 1.4 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order = 6 ewald_rtol = 1e-06 epsilon_surface = 0 ; Temperature coupling Tcoupl = nose-hoover tc_grps = system tau_t = 1.0 ref_t = some_temperature ; Pressure coupling is off for NVT Pcoupl = No tau_p = 0.5 compressibility = 4.5e-05 ref_p = 1.0 ; options for bonds constraints = all-bonds constraint_algorithm = lincs On Fri, Apr 27, 2018 at 1:14 PM, Alex wrote: As I said, only two users, and nvidia-smi shows the process name. We're investigating and it does appear that it is EM that uses cutoff electrostatics and as a result the user did not bother with -pme cpu in the mdrun call. What would be the correct way to enforce cpu-only mdrun when coulombtype = cutoff? Thanks, Alex On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham < mark.j.abra...@gmail.com wrote: No. Look at the processes that are running, e.g. with top or ps. Either old simulations or another user is running. Mark On Fri, Apr 27, 2018, 20:33
Re: [gmx-users] strange GPU load distribution
On 5/6/18 5:51 PM, Alex wrote: Unfortunately, we're still bogged down when the EM runs (example below) start -- CPU usage by these jobs is initially low, while their PIDs show up in nvidia-smi. After about a minute all goes back to normal. Because the user is doing it frequently (scripted), everything is slowed down by a large factor. Interestingly, we have another user utilizing a GPU with another MD package (LAMMPS) and that GPU is never touched by these EM jobs. Any ideas will be greatly appreciated. Thinking out loud - a run that explicitly calls for only the CPU to be used might be trying to detect GPU if mdrun is GPU-enabled. Is that a possibility, including any latency in detecting that device? Have you tested to make sure that an mdrun binary that is explicitly disabled from using GPU (-DGMX_GPU=OFF) doesn't affect the GPU usage when running the same command? -Justin Thanks, Alex PID TTY STAT TIME COMMAND 60432 pts/8 Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb cpu -pme cpu -deffnm em_steep On 4/27/2018 2:16 PM, Mark Abraham wrote: Hi, What you think was run isn't nearly as useful when troubleshooting as asking the kernel what is actually running. Mark On Fri, Apr 27, 2018, 21:59 Alexwrote: Mark, I copied the exact command line from the script, right above the mdp file. It is literally how the script calls mdrun in this case: gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm On 4/27/2018 1:52 PM, Mark Abraham wrote: Group cutoff scheme can never run on a gpu, so none of that should matter. Use ps and find out what the command lines were. Mark On Fri, Apr 27, 2018, 21:37 Alex wrote: Update: we're basically removing commands one by one from the script that submits the jobs causing the issue. The culprit is both EM and the MD run: and GPUs are being affected _before_ MD starts loading the CPU, i.e. this is the initial setting up of the EM run -- CPU load is near zero, nvidia-smi reports the mess. I wonder if this is in any way related to that timing test we were failing a while back. mdrun call and mdp below, though I suspect they have nothing to do with what is happening. Any help will be very highly appreciated. Alex *** gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm mdp: ; Run control integrator = md-vv ; Velocity Verlet tinit = 0 dt = 0.002 nsteps = 50 ; 1 ns nstcomm = 100 ; Output control nstxout = 5 nstvout = 5 nstfout = 0 nstlog = 5 nstenergy = 5 nstxout-compressed = 0 ; Neighborsearching and short-range nonbonded interactions cutoff-scheme = group nstlist = 10 ns_type = grid pbc = xyz rlist = 1.4 ; Electrostatics coulombtype = cutoff rcoulomb = 1.4 ; van der Waals vdwtype = user vdw-modifier = none rvdw = 1.4 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order = 6 ewald_rtol = 1e-06 epsilon_surface = 0 ; Temperature coupling Tcoupl = nose-hoover tc_grps = system tau_t = 1.0 ref_t = some_temperature ; Pressure coupling is off for NVT Pcoupl = No tau_p = 0.5 compressibility = 4.5e-05 ref_p = 1.0 ; options for bonds constraints = all-bonds constraint_algorithm = lincs On Fri, Apr 27, 2018 at 1:14 PM, Alex wrote: As I said, only two users, and nvidia-smi shows the process name. We're investigating and it does appear that it is EM that uses cutoff electrostatics and as a result the user did not bother with -pme cpu in the mdrun call. What would be the correct way to enforce cpu-only mdrun when coulombtype = cutoff? Thanks, Alex On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham < mark.j.abra...@gmail.com wrote: No. Look at the processes that are running, e.g. with top or ps. Either old simulations or another user is running. Mark On Fri, Apr 27, 2018, 20:33 Alex wrote: Strange. There are only two people using this machine, myself being one of them, and the other person specifically forces -nb cpu -pme cpu in his calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, grompp, or energy) trying to use GPUs? Thanks, Alex On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll < pall.szil...@gmail.com wrote: The second column is PIDs so there is a whole lot more going on there than just a
Re: [gmx-users] strange GPU load distribution
Unfortunately, we're still bogged down when the EM runs (example below) start -- CPU usage by these jobs is initially low, while their PIDs show up in nvidia-smi. After about a minute all goes back to normal. Because the user is doing it frequently (scripted), everything is slowed down by a large factor. Interestingly, we have another user utilizing a GPU with another MD package (LAMMPS) and that GPU is never touched by these EM jobs. Any ideas will be greatly appreciated. Thanks, Alex PID TTY STAT TIME COMMAND 60432 pts/8 Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb cpu -pme cpu -deffnm em_steep On 4/27/2018 2:16 PM, Mark Abraham wrote: Hi, What you think was run isn't nearly as useful when troubleshooting as asking the kernel what is actually running. Mark On Fri, Apr 27, 2018, 21:59 Alexwrote: Mark, I copied the exact command line from the script, right above the mdp file. It is literally how the script calls mdrun in this case: gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm On 4/27/2018 1:52 PM, Mark Abraham wrote: Group cutoff scheme can never run on a gpu, so none of that should matter. Use ps and find out what the command lines were. Mark On Fri, Apr 27, 2018, 21:37 Alex wrote: Update: we're basically removing commands one by one from the script that submits the jobs causing the issue. The culprit is both EM and the MD run: and GPUs are being affected _before_ MD starts loading the CPU, i.e. this is the initial setting up of the EM run -- CPU load is near zero, nvidia-smi reports the mess. I wonder if this is in any way related to that timing test we were failing a while back. mdrun call and mdp below, though I suspect they have nothing to do with what is happening. Any help will be very highly appreciated. Alex *** gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm mdp: ; Run control integrator = md-vv ; Velocity Verlet tinit= 0 dt = 0.002 nsteps = 50; 1 ns nstcomm = 100 ; Output control nstxout = 5 nstvout = 5 nstfout = 0 nstlog = 5 nstenergy= 5 nstxout-compressed = 0 ; Neighborsearching and short-range nonbonded interactions cutoff-scheme= group nstlist = 10 ns_type = grid pbc = xyz rlist= 1.4 ; Electrostatics coulombtype = cutoff rcoulomb = 1.4 ; van der Waals vdwtype = user vdw-modifier = none rvdw = 1.4 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order= 6 ewald_rtol = 1e-06 epsilon_surface = 0 ; Temperature coupling Tcoupl = nose-hoover tc_grps = system tau_t= 1.0 ref_t= some_temperature ; Pressure coupling is off for NVT Pcoupl = No tau_p= 0.5 compressibility = 4.5e-05 ref_p= 1.0 ; options for bonds constraints = all-bonds constraint_algorithm = lincs On Fri, Apr 27, 2018 at 1:14 PM, Alex wrote: As I said, only two users, and nvidia-smi shows the process name. We're investigating and it does appear that it is EM that uses cutoff electrostatics and as a result the user did not bother with -pme cpu in the mdrun call. What would be the correct way to enforce cpu-only mdrun when coulombtype = cutoff? Thanks, Alex On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham < mark.j.abra...@gmail.com wrote: No. Look at the processes that are running, e.g. with top or ps. Either old simulations or another user is running. Mark On Fri, Apr 27, 2018, 20:33 Alex wrote: Strange. There are only two people using this machine, myself being one of them, and the other person specifically forces -nb cpu -pme cpu in his calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, grompp, or energy) trying to use GPUs? Thanks, Alex On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll < pall.szil...@gmail.com wrote: The second column is PIDs so there is a whole lot more going on there than just a single simulation, single rank using two GPUs. That would be one PID and two entries for the two GPUs. Are you sure you're not running other processes? -- Szilárd On Thu, Apr 26, 2018 at 5:52 AM, Alex wrote: Hi all, I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 -ntmpi 4 -npme 1 -pme gpu -nb gpu -gputasks 1122 Once in a while the simulation slows down and nvidia-smi reports
Re: [gmx-users] strange GPU load distribution
Hi Mark, We checked and one example is below. Thanks, Alex PID TTY STAT TIME COMMAND 60432 pts/8 Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb cpu -pme cpu -deffnm em_steep On 4/27/2018 2:16 PM, Mark Abraham wrote: Hi, What you think was run isn't nearly as useful when troubleshooting as asking the kernel what is actually running. Mark On Fri, Apr 27, 2018, 21:59 Alexwrote: Mark, I copied the exact command line from the script, right above the mdp file. It is literally how the script calls mdrun in this case: gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm On 4/27/2018 1:52 PM, Mark Abraham wrote: Group cutoff scheme can never run on a gpu, so none of that should matter. Use ps and find out what the command lines were. Mark On Fri, Apr 27, 2018, 21:37 Alex wrote: Update: we're basically removing commands one by one from the script that submits the jobs causing the issue. The culprit is both EM and the MD run: and GPUs are being affected _before_ MD starts loading the CPU, i.e. this is the initial setting up of the EM run -- CPU load is near zero, nvidia-smi reports the mess. I wonder if this is in any way related to that timing test we were failing a while back. mdrun call and mdp below, though I suspect they have nothing to do with what is happening. Any help will be very highly appreciated. Alex *** gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm mdp: ; Run control integrator = md-vv ; Velocity Verlet tinit= 0 dt = 0.002 nsteps = 50; 1 ns nstcomm = 100 ; Output control nstxout = 5 nstvout = 5 nstfout = 0 nstlog = 5 nstenergy= 5 nstxout-compressed = 0 ; Neighborsearching and short-range nonbonded interactions cutoff-scheme= group nstlist = 10 ns_type = grid pbc = xyz rlist= 1.4 ; Electrostatics coulombtype = cutoff rcoulomb = 1.4 ; van der Waals vdwtype = user vdw-modifier = none rvdw = 1.4 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order= 6 ewald_rtol = 1e-06 epsilon_surface = 0 ; Temperature coupling Tcoupl = nose-hoover tc_grps = system tau_t= 1.0 ref_t= some_temperature ; Pressure coupling is off for NVT Pcoupl = No tau_p= 0.5 compressibility = 4.5e-05 ref_p= 1.0 ; options for bonds constraints = all-bonds constraint_algorithm = lincs On Fri, Apr 27, 2018 at 1:14 PM, Alex wrote: As I said, only two users, and nvidia-smi shows the process name. We're investigating and it does appear that it is EM that uses cutoff electrostatics and as a result the user did not bother with -pme cpu in the mdrun call. What would be the correct way to enforce cpu-only mdrun when coulombtype = cutoff? Thanks, Alex On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham < mark.j.abra...@gmail.com wrote: No. Look at the processes that are running, e.g. with top or ps. Either old simulations or another user is running. Mark On Fri, Apr 27, 2018, 20:33 Alex wrote: Strange. There are only two people using this machine, myself being one of them, and the other person specifically forces -nb cpu -pme cpu in his calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, grompp, or energy) trying to use GPUs? Thanks, Alex On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll < pall.szil...@gmail.com wrote: The second column is PIDs so there is a whole lot more going on there than just a single simulation, single rank using two GPUs. That would be one PID and two entries for the two GPUs. Are you sure you're not running other processes? -- Szilárd On Thu, Apr 26, 2018 at 5:52 AM, Alex wrote: Hi all, I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 -ntmpi 4 -npme 1 -pme gpu -nb gpu -gputasks 1122 Once in a while the simulation slows down and nvidia-smi reports something like this: |1 12981 C gmx 175MiB | |2 12981 C gmx 217MiB | |2 13083 C gmx 161MiB | |2 13086 C gmx 159MiB | |2 13089 C gmx 139MiB | |2 13093 C gmx 163MiB | |2 13096 C gmx 11MiB | |2 13099 C gmx 8MiB | |2 13102 C gmx 8MiB | |2 13106 C gmx 8MiB | |2 13109 C gmx 8MiB | |2 13112
Re: [gmx-users] strange GPU load distribution
I see. :) I will check again when I am back at work. Thanks! Alex On 4/27/2018 2:16 PM, Mark Abraham wrote: Hi, What you think was run isn't nearly as useful when troubleshooting as asking the kernel what is actually running. Mark On Fri, Apr 27, 2018, 21:59 Alexwrote: Mark, I copied the exact command line from the script, right above the mdp file. It is literally how the script calls mdrun in this case: gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm On 4/27/2018 1:52 PM, Mark Abraham wrote: Group cutoff scheme can never run on a gpu, so none of that should matter. Use ps and find out what the command lines were. Mark On Fri, Apr 27, 2018, 21:37 Alex wrote: Update: we're basically removing commands one by one from the script that submits the jobs causing the issue. The culprit is both EM and the MD run: and GPUs are being affected _before_ MD starts loading the CPU, i.e. this is the initial setting up of the EM run -- CPU load is near zero, nvidia-smi reports the mess. I wonder if this is in any way related to that timing test we were failing a while back. mdrun call and mdp below, though I suspect they have nothing to do with what is happening. Any help will be very highly appreciated. Alex *** gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm mdp: ; Run control integrator = md-vv ; Velocity Verlet tinit= 0 dt = 0.002 nsteps = 50; 1 ns nstcomm = 100 ; Output control nstxout = 5 nstvout = 5 nstfout = 0 nstlog = 5 nstenergy= 5 nstxout-compressed = 0 ; Neighborsearching and short-range nonbonded interactions cutoff-scheme= group nstlist = 10 ns_type = grid pbc = xyz rlist= 1.4 ; Electrostatics coulombtype = cutoff rcoulomb = 1.4 ; van der Waals vdwtype = user vdw-modifier = none rvdw = 1.4 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order= 6 ewald_rtol = 1e-06 epsilon_surface = 0 ; Temperature coupling Tcoupl = nose-hoover tc_grps = system tau_t= 1.0 ref_t= some_temperature ; Pressure coupling is off for NVT Pcoupl = No tau_p= 0.5 compressibility = 4.5e-05 ref_p= 1.0 ; options for bonds constraints = all-bonds constraint_algorithm = lincs On Fri, Apr 27, 2018 at 1:14 PM, Alex wrote: As I said, only two users, and nvidia-smi shows the process name. We're investigating and it does appear that it is EM that uses cutoff electrostatics and as a result the user did not bother with -pme cpu in the mdrun call. What would be the correct way to enforce cpu-only mdrun when coulombtype = cutoff? Thanks, Alex On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham < mark.j.abra...@gmail.com wrote: No. Look at the processes that are running, e.g. with top or ps. Either old simulations or another user is running. Mark On Fri, Apr 27, 2018, 20:33 Alex wrote: Strange. There are only two people using this machine, myself being one of them, and the other person specifically forces -nb cpu -pme cpu in his calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, grompp, or energy) trying to use GPUs? Thanks, Alex On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll < pall.szil...@gmail.com wrote: The second column is PIDs so there is a whole lot more going on there than just a single simulation, single rank using two GPUs. That would be one PID and two entries for the two GPUs. Are you sure you're not running other processes? -- Szilárd On Thu, Apr 26, 2018 at 5:52 AM, Alex wrote: Hi all, I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 -ntmpi 4 -npme 1 -pme gpu -nb gpu -gputasks 1122 Once in a while the simulation slows down and nvidia-smi reports something like this: |1 12981 C gmx 175MiB | |2 12981 C gmx 217MiB | |2 13083 C gmx 161MiB | |2 13086 C gmx 159MiB | |2 13089 C gmx 139MiB | |2 13093 C gmx 163MiB | |2 13096 C gmx 11MiB | |2 13099 C gmx 8MiB | |2 13102 C gmx 8MiB | |2 13106 C gmx 8MiB | |2 13109 C gmx 8MiB | |2 13112 C gmx 8MiB | |2 13115 C gmx 8MiB | |2 13119 C gmx 8MiB | |2 13122 C gmx 8MiB | |2
Re: [gmx-users] strange GPU load distribution
Hi, What you think was run isn't nearly as useful when troubleshooting as asking the kernel what is actually running. Mark On Fri, Apr 27, 2018, 21:59 Alexwrote: > Mark, I copied the exact command line from the script, right above the > mdp file. It is literally how the script calls mdrun in this case: > > gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm > > > On 4/27/2018 1:52 PM, Mark Abraham wrote: > > Group cutoff scheme can never run on a gpu, so none of that should > matter. > > Use ps and find out what the command lines were. > > > > Mark > > > > On Fri, Apr 27, 2018, 21:37 Alex wrote: > > > >> Update: we're basically removing commands one by one from the script > that > >> submits the jobs causing the issue. The culprit is both EM and the MD > run: > >> and GPUs are being affected _before_ MD starts loading the CPU, i.e. > this > >> is the initial setting up of the EM run -- CPU load is near zero, > >> nvidia-smi reports the mess. I wonder if this is in any way related to > that > >> timing test we were failing a while back. > >> mdrun call and mdp below, though I suspect they have nothing to do with > >> what is happening. Any help will be very highly appreciated. > >> > >> Alex > >> > >> *** > >> > >> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm > >> > >> mdp: > >> > >> ; Run control > >> integrator = md-vv ; Velocity Verlet > >> tinit= 0 > >> dt = 0.002 > >> nsteps = 50; 1 ns > >> nstcomm = 100 > >> ; Output control > >> nstxout = 5 > >> nstvout = 5 > >> nstfout = 0 > >> nstlog = 5 > >> nstenergy= 5 > >> nstxout-compressed = 0 > >> ; Neighborsearching and short-range nonbonded interactions > >> cutoff-scheme= group > >> nstlist = 10 > >> ns_type = grid > >> pbc = xyz > >> rlist= 1.4 > >> ; Electrostatics > >> coulombtype = cutoff > >> rcoulomb = 1.4 > >> ; van der Waals > >> vdwtype = user > >> vdw-modifier = none > >> rvdw = 1.4 > >> ; Apply long range dispersion corrections for Energy and Pressure > >> DispCorr = EnerPres > >> ; Spacing for the PME/PPPM FFT grid > >> fourierspacing = 0.12 > >> ; EWALD/PME/PPPM parameters > >> pme_order= 6 > >> ewald_rtol = 1e-06 > >> epsilon_surface = 0 > >> ; Temperature coupling > >> Tcoupl = nose-hoover > >> tc_grps = system > >> tau_t= 1.0 > >> ref_t= some_temperature > >> ; Pressure coupling is off for NVT > >> Pcoupl = No > >> tau_p= 0.5 > >> compressibility = 4.5e-05 > >> ref_p= 1.0 > >> ; options for bonds > >> constraints = all-bonds > >> constraint_algorithm = lincs > >> > >> > >> > >> > >> > >> > >> On Fri, Apr 27, 2018 at 1:14 PM, Alex wrote: > >> > >>> As I said, only two users, and nvidia-smi shows the process name. We're > >>> investigating and it does appear that it is EM that uses cutoff > >>> electrostatics and as a result the user did not bother with -pme cpu in > >> the > >>> mdrun call. What would be the correct way to enforce cpu-only mdrun > when > >>> coulombtype = cutoff? > >>> > >>> Thanks, > >>> > >>> Alex > >>> > >>> On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham < > mark.j.abra...@gmail.com > >>> > >>> wrote: > >>> > No. > > Look at the processes that are running, e.g. with top or ps. Either > old > simulations or another user is running. > > Mark > > On Fri, Apr 27, 2018, 20:33 Alex wrote: > > > Strange. There are only two people using this machine, myself being > >> one > of > > them, and the other person specifically forces -nb cpu -pme cpu in > his > > calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, > grompp, > > or energy) trying to use GPUs? > > > > Thanks, > > > > Alex > > > > On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll < > pall.szil...@gmail.com > > wrote: > > > >> The second column is PIDs so there is a whole lot more going on > >> there > > than > >> just a single simulation, single rank using two GPUs. That would be > one > > PID > >> and two entries for the two GPUs. Are you sure you're not running > other > >> processes? > >> > >> -- > >> Szilárd > >> > >> On Thu, Apr 26, 2018 at 5:52 AM, Alex wrote: > >> > >>> Hi all, > >>> > >>> I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 > > -ntmpi 4 > >>> -npme 1 -pme
Re: [gmx-users] strange GPU load distribution
Mark, I copied the exact command line from the script, right above the mdp file. It is literally how the script calls mdrun in this case: gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm On 4/27/2018 1:52 PM, Mark Abraham wrote: Group cutoff scheme can never run on a gpu, so none of that should matter. Use ps and find out what the command lines were. Mark On Fri, Apr 27, 2018, 21:37 Alexwrote: Update: we're basically removing commands one by one from the script that submits the jobs causing the issue. The culprit is both EM and the MD run: and GPUs are being affected _before_ MD starts loading the CPU, i.e. this is the initial setting up of the EM run -- CPU load is near zero, nvidia-smi reports the mess. I wonder if this is in any way related to that timing test we were failing a while back. mdrun call and mdp below, though I suspect they have nothing to do with what is happening. Any help will be very highly appreciated. Alex *** gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm mdp: ; Run control integrator = md-vv ; Velocity Verlet tinit= 0 dt = 0.002 nsteps = 50; 1 ns nstcomm = 100 ; Output control nstxout = 5 nstvout = 5 nstfout = 0 nstlog = 5 nstenergy= 5 nstxout-compressed = 0 ; Neighborsearching and short-range nonbonded interactions cutoff-scheme= group nstlist = 10 ns_type = grid pbc = xyz rlist= 1.4 ; Electrostatics coulombtype = cutoff rcoulomb = 1.4 ; van der Waals vdwtype = user vdw-modifier = none rvdw = 1.4 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order= 6 ewald_rtol = 1e-06 epsilon_surface = 0 ; Temperature coupling Tcoupl = nose-hoover tc_grps = system tau_t= 1.0 ref_t= some_temperature ; Pressure coupling is off for NVT Pcoupl = No tau_p= 0.5 compressibility = 4.5e-05 ref_p= 1.0 ; options for bonds constraints = all-bonds constraint_algorithm = lincs On Fri, Apr 27, 2018 at 1:14 PM, Alex wrote: As I said, only two users, and nvidia-smi shows the process name. We're investigating and it does appear that it is EM that uses cutoff electrostatics and as a result the user did not bother with -pme cpu in the mdrun call. What would be the correct way to enforce cpu-only mdrun when coulombtype = cutoff? Thanks, Alex On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham wrote: Strange. There are only two people using this machine, myself being one of them, and the other person specifically forces -nb cpu -pme cpu in his calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, grompp, or energy) trying to use GPUs? Thanks, Alex On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll wrote: Hi all, I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 -ntmpi 4 -npme 1 -pme gpu -nb gpu -gputasks 1122 Once in a while the simulation slows down and nvidia-smi reports something like this: |1 12981 C gmx 175MiB | |2 12981 C gmx 217MiB | |2 13083 C gmx 161MiB | |2 13086 C gmx 159MiB | |2 13089 C gmx 139MiB | |2 13093 C gmx 163MiB | |2 13096 C gmx 11MiB | |2 13099 C gmx 8MiB | |2 13102 C gmx 8MiB | |2 13106 C gmx 8MiB | |2 13109 C gmx 8MiB | |2 13112 C gmx 8MiB | |2 13115 C gmx 8MiB | |2 13119 C gmx 8MiB | |2 13122 C gmx 8MiB | |2 13125 C gmx 8MiB | |2 13128 C gmx 8MiB | |2 13131 C gmx 8MiB | |2 13134 C gmx 8MiB | |2 13138 C gmx 8MiB | |2 13141 C gmx 8MiB | +--- --+ Then goes back to the expected load. Is this
Re: [gmx-users] strange GPU load distribution
Group cutoff scheme can never run on a gpu, so none of that should matter. Use ps and find out what the command lines were. Mark On Fri, Apr 27, 2018, 21:37 Alexwrote: > Update: we're basically removing commands one by one from the script that > submits the jobs causing the issue. The culprit is both EM and the MD run: > and GPUs are being affected _before_ MD starts loading the CPU, i.e. this > is the initial setting up of the EM run -- CPU load is near zero, > nvidia-smi reports the mess. I wonder if this is in any way related to that > timing test we were failing a while back. > mdrun call and mdp below, though I suspect they have nothing to do with > what is happening. Any help will be very highly appreciated. > > Alex > > *** > > gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm > > mdp: > > ; Run control > integrator = md-vv ; Velocity Verlet > tinit= 0 > dt = 0.002 > nsteps = 50; 1 ns > nstcomm = 100 > ; Output control > nstxout = 5 > nstvout = 5 > nstfout = 0 > nstlog = 5 > nstenergy= 5 > nstxout-compressed = 0 > ; Neighborsearching and short-range nonbonded interactions > cutoff-scheme= group > nstlist = 10 > ns_type = grid > pbc = xyz > rlist= 1.4 > ; Electrostatics > coulombtype = cutoff > rcoulomb = 1.4 > ; van der Waals > vdwtype = user > vdw-modifier = none > rvdw = 1.4 > ; Apply long range dispersion corrections for Energy and Pressure > DispCorr = EnerPres > ; Spacing for the PME/PPPM FFT grid > fourierspacing = 0.12 > ; EWALD/PME/PPPM parameters > pme_order= 6 > ewald_rtol = 1e-06 > epsilon_surface = 0 > ; Temperature coupling > Tcoupl = nose-hoover > tc_grps = system > tau_t= 1.0 > ref_t= some_temperature > ; Pressure coupling is off for NVT > Pcoupl = No > tau_p= 0.5 > compressibility = 4.5e-05 > ref_p= 1.0 > ; options for bonds > constraints = all-bonds > constraint_algorithm = lincs > > > > > > > On Fri, Apr 27, 2018 at 1:14 PM, Alex wrote: > > > As I said, only two users, and nvidia-smi shows the process name. We're > > investigating and it does appear that it is EM that uses cutoff > > electrostatics and as a result the user did not bother with -pme cpu in > the > > mdrun call. What would be the correct way to enforce cpu-only mdrun when > > coulombtype = cutoff? > > > > Thanks, > > > > Alex > > > > On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham > > > wrote: > > > >> No. > >> > >> Look at the processes that are running, e.g. with top or ps. Either old > >> simulations or another user is running. > >> > >> Mark > >> > >> On Fri, Apr 27, 2018, 20:33 Alex wrote: > >> > >> > Strange. There are only two people using this machine, myself being > one > >> of > >> > them, and the other person specifically forces -nb cpu -pme cpu in his > >> > calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, > >> grompp, > >> > or energy) trying to use GPUs? > >> > > >> > Thanks, > >> > > >> > Alex > >> > > >> > On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll > > >> > wrote: > >> > > >> > > The second column is PIDs so there is a whole lot more going on > there > >> > than > >> > > just a single simulation, single rank using two GPUs. That would be > >> one > >> > PID > >> > > and two entries for the two GPUs. Are you sure you're not running > >> other > >> > > processes? > >> > > > >> > > -- > >> > > Szilárd > >> > > > >> > > On Thu, Apr 26, 2018 at 5:52 AM, Alex wrote: > >> > > > >> > > > Hi all, > >> > > > > >> > > > I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 > >> > -ntmpi 4 > >> > > > -npme 1 -pme gpu -nb gpu -gputasks 1122 > >> > > > > >> > > > Once in a while the simulation slows down and nvidia-smi reports > >> > > something > >> > > > like this: > >> > > > > >> > > > |1 12981 C gmx > >> > > > 175MiB | > >> > > > |2 12981 C gmx > >> > > > 217MiB | > >> > > > |2 13083 C gmx > >> > > > 161MiB | > >> > > > |2 13086 C gmx > >> > > > 159MiB | > >> > > > |2 13089 C gmx > >> > > > 139MiB | > >> > > > |2 13093 C gmx > >> > > > 163MiB | > >> > > > |2 13096 C gmx > >> > > > 11MiB | > >> > > > |2 13099 C gmx > >> > > > 8MiB | > >> > > > |2 13102 C gmx > >> > > > 8MiB | > >> > > > |2 13106 C gmx > >> > > > 8MiB | > >> > >
Re: [gmx-users] strange GPU load distribution
Update: we're basically removing commands one by one from the script that submits the jobs causing the issue. The culprit is both EM and the MD run: and GPUs are being affected _before_ MD starts loading the CPU, i.e. this is the initial setting up of the EM run -- CPU load is near zero, nvidia-smi reports the mess. I wonder if this is in any way related to that timing test we were failing a while back. mdrun call and mdp below, though I suspect they have nothing to do with what is happening. Any help will be very highly appreciated. Alex *** gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm mdp: ; Run control integrator = md-vv ; Velocity Verlet tinit= 0 dt = 0.002 nsteps = 50; 1 ns nstcomm = 100 ; Output control nstxout = 5 nstvout = 5 nstfout = 0 nstlog = 5 nstenergy= 5 nstxout-compressed = 0 ; Neighborsearching and short-range nonbonded interactions cutoff-scheme= group nstlist = 10 ns_type = grid pbc = xyz rlist= 1.4 ; Electrostatics coulombtype = cutoff rcoulomb = 1.4 ; van der Waals vdwtype = user vdw-modifier = none rvdw = 1.4 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order= 6 ewald_rtol = 1e-06 epsilon_surface = 0 ; Temperature coupling Tcoupl = nose-hoover tc_grps = system tau_t= 1.0 ref_t= some_temperature ; Pressure coupling is off for NVT Pcoupl = No tau_p= 0.5 compressibility = 4.5e-05 ref_p= 1.0 ; options for bonds constraints = all-bonds constraint_algorithm = lincs On Fri, Apr 27, 2018 at 1:14 PM, Alexwrote: > As I said, only two users, and nvidia-smi shows the process name. We're > investigating and it does appear that it is EM that uses cutoff > electrostatics and as a result the user did not bother with -pme cpu in the > mdrun call. What would be the correct way to enforce cpu-only mdrun when > coulombtype = cutoff? > > Thanks, > > Alex > > On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham > wrote: > >> No. >> >> Look at the processes that are running, e.g. with top or ps. Either old >> simulations or another user is running. >> >> Mark >> >> On Fri, Apr 27, 2018, 20:33 Alex wrote: >> >> > Strange. There are only two people using this machine, myself being one >> of >> > them, and the other person specifically forces -nb cpu -pme cpu in his >> > calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, >> grompp, >> > or energy) trying to use GPUs? >> > >> > Thanks, >> > >> > Alex >> > >> > On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll >> > wrote: >> > >> > > The second column is PIDs so there is a whole lot more going on there >> > than >> > > just a single simulation, single rank using two GPUs. That would be >> one >> > PID >> > > and two entries for the two GPUs. Are you sure you're not running >> other >> > > processes? >> > > >> > > -- >> > > Szilárd >> > > >> > > On Thu, Apr 26, 2018 at 5:52 AM, Alex wrote: >> > > >> > > > Hi all, >> > > > >> > > > I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 >> > -ntmpi 4 >> > > > -npme 1 -pme gpu -nb gpu -gputasks 1122 >> > > > >> > > > Once in a while the simulation slows down and nvidia-smi reports >> > > something >> > > > like this: >> > > > >> > > > |1 12981 C gmx >> > > > 175MiB | >> > > > |2 12981 C gmx >> > > > 217MiB | >> > > > |2 13083 C gmx >> > > > 161MiB | >> > > > |2 13086 C gmx >> > > > 159MiB | >> > > > |2 13089 C gmx >> > > > 139MiB | >> > > > |2 13093 C gmx >> > > > 163MiB | >> > > > |2 13096 C gmx >> > > > 11MiB | >> > > > |2 13099 C gmx >> > > > 8MiB | >> > > > |2 13102 C gmx >> > > > 8MiB | >> > > > |2 13106 C gmx >> > > > 8MiB | >> > > > |2 13109 C gmx >> > > > 8MiB | >> > > > |2 13112 C gmx >> > > > 8MiB | >> > > > |2 13115 C gmx >> > > > 8MiB | >> > > > |2 13119 C gmx >> > > > 8MiB | >> > > > |2 13122 C gmx >> > > > 8MiB | >> > > > |2 13125 C gmx >> > > > 8MiB | >> > > > |2 13128 C gmx >> > > > 8MiB | >> > > > |2 13131 C gmx >> > > > 8MiB | >> > > > |2 13134 C gmx >> > > > 8MiB | >> > > > |2 13138
Re: [gmx-users] strange GPU load distribution
As I said, only two users, and nvidia-smi shows the process name. We're investigating and it does appear that it is EM that uses cutoff electrostatics and as a result the user did not bother with -pme cpu in the mdrun call. What would be the correct way to enforce cpu-only mdrun when coulombtype = cutoff? Thanks, Alex On Fri, Apr 27, 2018 at 12:45 PM, Mark Abrahamwrote: > No. > > Look at the processes that are running, e.g. with top or ps. Either old > simulations or another user is running. > > Mark > > On Fri, Apr 27, 2018, 20:33 Alex wrote: > > > Strange. There are only two people using this machine, myself being one > of > > them, and the other person specifically forces -nb cpu -pme cpu in his > > calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, > grompp, > > or energy) trying to use GPUs? > > > > Thanks, > > > > Alex > > > > On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll > > wrote: > > > > > The second column is PIDs so there is a whole lot more going on there > > than > > > just a single simulation, single rank using two GPUs. That would be one > > PID > > > and two entries for the two GPUs. Are you sure you're not running other > > > processes? > > > > > > -- > > > Szilárd > > > > > > On Thu, Apr 26, 2018 at 5:52 AM, Alex wrote: > > > > > > > Hi all, > > > > > > > > I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 > > -ntmpi 4 > > > > -npme 1 -pme gpu -nb gpu -gputasks 1122 > > > > > > > > Once in a while the simulation slows down and nvidia-smi reports > > > something > > > > like this: > > > > > > > > |1 12981 C gmx > > > > 175MiB | > > > > |2 12981 C gmx > > > > 217MiB | > > > > |2 13083 C gmx > > > > 161MiB | > > > > |2 13086 C gmx > > > > 159MiB | > > > > |2 13089 C gmx > > > > 139MiB | > > > > |2 13093 C gmx > > > > 163MiB | > > > > |2 13096 C gmx > > > > 11MiB | > > > > |2 13099 C gmx > > > > 8MiB | > > > > |2 13102 C gmx > > > > 8MiB | > > > > |2 13106 C gmx > > > > 8MiB | > > > > |2 13109 C gmx > > > > 8MiB | > > > > |2 13112 C gmx > > > > 8MiB | > > > > |2 13115 C gmx > > > > 8MiB | > > > > |2 13119 C gmx > > > > 8MiB | > > > > |2 13122 C gmx > > > > 8MiB | > > > > |2 13125 C gmx > > > > 8MiB | > > > > |2 13128 C gmx > > > > 8MiB | > > > > |2 13131 C gmx > > > > 8MiB | > > > > |2 13134 C gmx > > > > 8MiB | > > > > |2 13138 C gmx > > > > 8MiB | > > > > |2 13141 C gmx > > > > 8MiB | > > > > +--- > > > > --+ > > > > > > > > Then goes back to the expected load. Is this normal? > > > > > > > > Thanks, > > > > > > > > Alex > > > > > > > > -- > > > > Gromacs Users mailing list > > > > > > > > * Please search the archive at http://www.gromacs.org/Support > > > > /Mailing_Lists/GMX-Users_List before posting! > > > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > > > * For (un)subscribe requests visit > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users > or > > > > send a mail to gmx-users-requ...@gromacs.org. > > > -- > > > Gromacs Users mailing list > > > > > > * Please search the archive at http://www.gromacs.org/ > > > Support/Mailing_Lists/GMX-Users_List before posting! > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > * For (un)subscribe requests visit > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > > send a mail to gmx-users-requ...@gromacs.org. > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at http://www.gromacs.org/ > Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] strange GPU load distribution
No. Look at the processes that are running, e.g. with top or ps. Either old simulations or another user is running. Mark On Fri, Apr 27, 2018, 20:33 Alexwrote: > Strange. There are only two people using this machine, myself being one of > them, and the other person specifically forces -nb cpu -pme cpu in his > calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, grompp, > or energy) trying to use GPUs? > > Thanks, > > Alex > > On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll > wrote: > > > The second column is PIDs so there is a whole lot more going on there > than > > just a single simulation, single rank using two GPUs. That would be one > PID > > and two entries for the two GPUs. Are you sure you're not running other > > processes? > > > > -- > > Szilárd > > > > On Thu, Apr 26, 2018 at 5:52 AM, Alex wrote: > > > > > Hi all, > > > > > > I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 > -ntmpi 4 > > > -npme 1 -pme gpu -nb gpu -gputasks 1122 > > > > > > Once in a while the simulation slows down and nvidia-smi reports > > something > > > like this: > > > > > > |1 12981 C gmx > > > 175MiB | > > > |2 12981 C gmx > > > 217MiB | > > > |2 13083 C gmx > > > 161MiB | > > > |2 13086 C gmx > > > 159MiB | > > > |2 13089 C gmx > > > 139MiB | > > > |2 13093 C gmx > > > 163MiB | > > > |2 13096 C gmx > > > 11MiB | > > > |2 13099 C gmx > > > 8MiB | > > > |2 13102 C gmx > > > 8MiB | > > > |2 13106 C gmx > > > 8MiB | > > > |2 13109 C gmx > > > 8MiB | > > > |2 13112 C gmx > > > 8MiB | > > > |2 13115 C gmx > > > 8MiB | > > > |2 13119 C gmx > > > 8MiB | > > > |2 13122 C gmx > > > 8MiB | > > > |2 13125 C gmx > > > 8MiB | > > > |2 13128 C gmx > > > 8MiB | > > > |2 13131 C gmx > > > 8MiB | > > > |2 13134 C gmx > > > 8MiB | > > > |2 13138 C gmx > > > 8MiB | > > > |2 13141 C gmx > > > 8MiB | > > > +--- > > > --+ > > > > > > Then goes back to the expected load. Is this normal? > > > > > > Thanks, > > > > > > Alex > > > > > > -- > > > Gromacs Users mailing list > > > > > > * Please search the archive at http://www.gromacs.org/Support > > > /Mailing_Lists/GMX-Users_List before posting! > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > * For (un)subscribe requests visit > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > > send a mail to gmx-users-requ...@gromacs.org. > > -- > > Gromacs Users mailing list > > > > * Please search the archive at http://www.gromacs.org/ > > Support/Mailing_Lists/GMX-Users_List before posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] strange GPU load distribution
Strange. There are only two people using this machine, myself being one of them, and the other person specifically forces -nb cpu -pme cpu in his calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, grompp, or energy) trying to use GPUs? Thanks, Alex On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Pállwrote: > The second column is PIDs so there is a whole lot more going on there than > just a single simulation, single rank using two GPUs. That would be one PID > and two entries for the two GPUs. Are you sure you're not running other > processes? > > -- > Szilárd > > On Thu, Apr 26, 2018 at 5:52 AM, Alex wrote: > > > Hi all, > > > > I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 -ntmpi 4 > > -npme 1 -pme gpu -nb gpu -gputasks 1122 > > > > Once in a while the simulation slows down and nvidia-smi reports > something > > like this: > > > > |1 12981 C gmx > > 175MiB | > > |2 12981 C gmx > > 217MiB | > > |2 13083 C gmx > > 161MiB | > > |2 13086 C gmx > > 159MiB | > > |2 13089 C gmx > > 139MiB | > > |2 13093 C gmx > > 163MiB | > > |2 13096 C gmx > > 11MiB | > > |2 13099 C gmx > > 8MiB | > > |2 13102 C gmx > > 8MiB | > > |2 13106 C gmx > > 8MiB | > > |2 13109 C gmx > > 8MiB | > > |2 13112 C gmx > > 8MiB | > > |2 13115 C gmx > > 8MiB | > > |2 13119 C gmx > > 8MiB | > > |2 13122 C gmx > > 8MiB | > > |2 13125 C gmx > > 8MiB | > > |2 13128 C gmx > > 8MiB | > > |2 13131 C gmx > > 8MiB | > > |2 13134 C gmx > > 8MiB | > > |2 13138 C gmx > > 8MiB | > > |2 13141 C gmx > > 8MiB | > > +--- > > --+ > > > > Then goes back to the expected load. Is this normal? > > > > Thanks, > > > > Alex > > > > -- > > Gromacs Users mailing list > > > > * Please search the archive at http://www.gromacs.org/Support > > /Mailing_Lists/GMX-Users_List before posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at http://www.gromacs.org/ > Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] strange GPU load distribution
The second column is PIDs so there is a whole lot more going on there than just a single simulation, single rank using two GPUs. That would be one PID and two entries for the two GPUs. Are you sure you're not running other processes? -- Szilárd On Thu, Apr 26, 2018 at 5:52 AM, Alexwrote: > Hi all, > > I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24 -ntmpi 4 > -npme 1 -pme gpu -nb gpu -gputasks 1122 > > Once in a while the simulation slows down and nvidia-smi reports something > like this: > > |1 12981 C gmx > 175MiB | > |2 12981 C gmx > 217MiB | > |2 13083 C gmx > 161MiB | > |2 13086 C gmx > 159MiB | > |2 13089 C gmx > 139MiB | > |2 13093 C gmx > 163MiB | > |2 13096 C gmx > 11MiB | > |2 13099 C gmx > 8MiB | > |2 13102 C gmx > 8MiB | > |2 13106 C gmx > 8MiB | > |2 13109 C gmx > 8MiB | > |2 13112 C gmx > 8MiB | > |2 13115 C gmx > 8MiB | > |2 13119 C gmx > 8MiB | > |2 13122 C gmx > 8MiB | > |2 13125 C gmx > 8MiB | > |2 13128 C gmx > 8MiB | > |2 13131 C gmx > 8MiB | > |2 13134 C gmx > 8MiB | > |2 13138 C gmx > 8MiB | > |2 13141 C gmx > 8MiB | > +--- > --+ > > Then goes back to the expected load. Is this normal? > > Thanks, > > Alex > > -- > Gromacs Users mailing list > > * Please search the archive at http://www.gromacs.org/Support > /Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.