Hi Szilárd,
It really does appear that GMX_DISABLE_GPU_DETECTION=1 in the user's .bashrc
fixed it right up. We haven't tried his runs alongside GPU-accelerated jobs
yet, but he reports that none of his PIDs ever appear in nvidia-smi anymore and
overall his jobs start much faster.
This was an
I think we have everything ready at this point: a separate binary (not
sourced yet), and these options. We've set GMX_DISABLE_GPU_DETECTION=1 in
the user's .bashrc and will try the other option, if this one fails. Will
update here on the bogging down situation.
Thanks a lot.
Alex
On Mon, May
Hi,
You have at least one option more elegant than using a separate binary for
EM.
Set GMX_DISABLE_GPU_DETECTION=1 environment variable which is the internal
GROMACS override that forces the detection off for cases similar to
yours. That should solve the detection latency. If for some reason it
Thanks Mark. No need to be sorry, a CPU-only build is a simple enough
fix. Inelegant, but if it works, it's all good. I'll report as soon as
we have tried.
I myself run things in a way that you would find very familiar, but we
have a colleague developing forcefields and that involves tons of
Hi,
I don't see any problems there, but I note that there are run-time settings
for the driver/runtime to block until no other process is using the GPU,
which may be a contributing factor here.
As Justin noted, if your EM jobs would use a build of GROMACS that is not
configured to have access to
Mark,
I am forwarding the response I received from the colleague who prepared
the box for my GMX install -- this is from the latest installation of
2018.1. See text below and please let me know what you think. We have no
problem rebuilding things, but would like to understand what is wrong
Hi Mark,
I forwarded your email to the person who installed CUDA on our boxes.
Just to be clear, there is no persistent occupancy of the GPUs _after_
the process has finished. The observation is as follows: EM jobs
submitted > low CPU use by the EM jobs, GPUs bogged down, no output
files yet
Hi,
In 2018 and 2018.1, mdrun does indeed run GPU detection and compatibility
checks before any logic about whether it should use any GPUs that were in
fact detected. However, there's nothing about those checks that should a)
take any noticeable time, b) acquire any ongoing resources, or c) lead
On 5/6/18 6:11 PM, Alex wrote:
A separate CPU-only build is what we were going to try, but if it
succeeds with not touching GPUs, then what -- keep several builds?
If your CPU-only run produces something that doesn't touch the GPU
(which it shouldn't), that test would rather conclusively
A separate CPU-only build is what we were going to try, but if it
succeeds with not touching GPUs, then what -- keep several builds?
That latency you mention is definitely there, I think it is related to
my earlier report of one of the regression tests failing (I think Mark
might remember
On 5/6/18 5:51 PM, Alex wrote:
Unfortunately, we're still bogged down when the EM runs (example
below) start -- CPU usage by these jobs is initially low, while their
PIDs show up in nvidia-smi. After about a minute all goes back to
normal. Because the user is doing it frequently (scripted),
Unfortunately, we're still bogged down when the EM runs (example below)
start -- CPU usage by these jobs is initially low, while their PIDs show
up in nvidia-smi. After about a minute all goes back to normal. Because
the user is doing it frequently (scripted), everything is slowed down by
a
Hi Mark,
We checked and one example is below.
Thanks,
Alex
PID TTY STAT TIME COMMAND
60432 pts/8 Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb
cpu -pme cpu -deffnm em_steep
On 4/27/2018 2:16 PM, Mark Abraham wrote:
Hi,
What you think was run isn't nearly as useful
I see. :) I will check again when I am back at work.
Thanks!
Alex
On 4/27/2018 2:16 PM, Mark Abraham wrote:
Hi,
What you think was run isn't nearly as useful when troubleshooting as
asking the kernel what is actually running.
Mark
On Fri, Apr 27, 2018, 21:59 Alex
Hi,
What you think was run isn't nearly as useful when troubleshooting as
asking the kernel what is actually running.
Mark
On Fri, Apr 27, 2018, 21:59 Alex wrote:
> Mark, I copied the exact command line from the script, right above the
> mdp file. It is literally how the
Mark, I copied the exact command line from the script, right above the
mdp file. It is literally how the script calls mdrun in this case:
gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
On 4/27/2018 1:52 PM, Mark Abraham wrote:
Group cutoff scheme can never run on a gpu, so none of that should
Group cutoff scheme can never run on a gpu, so none of that should matter.
Use ps and find out what the command lines were.
Mark
On Fri, Apr 27, 2018, 21:37 Alex wrote:
> Update: we're basically removing commands one by one from the script that
> submits the jobs causing
Update: we're basically removing commands one by one from the script that
submits the jobs causing the issue. The culprit is both EM and the MD run:
and GPUs are being affected _before_ MD starts loading the CPU, i.e. this
is the initial setting up of the EM run -- CPU load is near zero,
As I said, only two users, and nvidia-smi shows the process name. We're
investigating and it does appear that it is EM that uses cutoff
electrostatics and as a result the user did not bother with -pme cpu in the
mdrun call. What would be the correct way to enforce cpu-only mdrun when
coulombtype =
No.
Look at the processes that are running, e.g. with top or ps. Either old
simulations or another user is running.
Mark
On Fri, Apr 27, 2018, 20:33 Alex wrote:
> Strange. There are only two people using this machine, myself being one of
> them, and the other person
Strange. There are only two people using this machine, myself being one of
them, and the other person specifically forces -nb cpu -pme cpu in his
calls to mdrun. Are any other GMX utilities (e.g. insert-molecules, grompp,
or energy) trying to use GPUs?
Thanks,
Alex
On Fri, Apr 27, 2018 at 5:33
The second column is PIDs so there is a whole lot more going on there than
just a single simulation, single rank using two GPUs. That would be one PID
and two entries for the two GPUs. Are you sure you're not running other
processes?
--
Szilárd
On Thu, Apr 26, 2018 at 5:52 AM, Alex
22 matches
Mail list logo