Re: Review Request 48376: Changed semantics for granting access to /dev/nvidiactl, etc.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48376/#review138097 --- Ship it! - Benjamin Mahler On June 11, 2016, 3:05 a.m., Kevin Klues wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48376/ > --- > > (Updated June 11, 2016, 3:05 a.m.) > > > Review request for mesos and Benjamin Mahler. > > > Bugs: MESOS- > https://issues.apache.org/jira/browse/MESOS- > > > Repository: mesos > > > Description > --- > > Previously, access to `/dev/nvidiactl` and `/dev/nvidia-uvm` was > only granted to / revoked from a container as GPUs were added and > removed from them. On some level, this makes sense because most jobs > don't need access to these devices unless they are also using a GPU. > However, there are cases when access to these files is appropriate, > even when not making use of a GPU. Running `nvidia-smi` to control the > global state of the underlying nvidia driver, for example. > > This commit adds `/dev/nvidiactl` and `/dev/nvidia-uvm` to the default > whitelist of devices to include in every container when the > `gpu/nvidia` isolator is enabled. This allows a container to run > standard nvidia driver tools (such as `nvidia-smi`) without failing > with abnormal errors when no GPUs have been granted to it. As such, > these tools will now report that no GPUs are installed instead of > failing abnormally. > > NOTE: Once we allow GPUs to be granted to containers with filesystem > isolation turned on, other criteria will be used to determine when / > if to grant access to these control devices. > > > Diffs > - > > src/slave/containerizer/mesos/isolators/gpu/isolator.cpp > d7557a0c338e8c0e51461b2326600c03f89c2e8b > > Diff: https://reviews.apache.org/r/48376/diff/ > > > Testing > --- > > GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests > > > Thanks, > > Kevin Klues > >
Re: Review Request 48376: Changed semantics for granting access to /dev/nvidiactl, etc.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48376/#review137098 --- Bad patch! Reviews applied: [48376, 48375, 48374, 48373, 48372, 48371, 48370, 48369, 48368, 48367, 48366, 48365, 48364, 48363, 48578] Failed command: ./support/apply-review.sh -n -r 48365 Error: 2016-06-11 03:31:29 URL:https://reviews.apache.org/r/48365/diff/raw/ [16129/16129] -> "48365.patch" [1] error: missing binary patch data for '3rdparty/nvml-352.79.tar.gz' error: binary patch does not apply to '3rdparty/nvml-352.79.tar.gz' error: 3rdparty/nvml-352.79.tar.gz: patch does not apply Full log: https://builds.apache.org/job/mesos-reviewbot/13683/console - Mesos ReviewBot On June 11, 2016, 3:05 a.m., Kevin Klues wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48376/ > --- > > (Updated June 11, 2016, 3:05 a.m.) > > > Review request for mesos and Benjamin Mahler. > > > Bugs: MESOS- > https://issues.apache.org/jira/browse/MESOS- > > > Repository: mesos > > > Description > --- > > Previously, access to `/dev/nvidiactl` and `/dev/nvidia-uvm` was > only granted to / revoked from a container as GPUs were added and > removed from them. On some level, this makes sense because most jobs > don't need access to these devices unless they are also using a GPU. > However, there are cases when access to these files is appropriate, > even when not making use of a GPU. Running `nvidia-smi` to control the > global state of the underlying nvidia driver, for example. > > This commit adds `/dev/nvidiactl` and `/dev/nvidia-uvm` to the default > whitelist of devices to include in every container when the > `gpu/nvidia` isolator is enabled. This allows a container to run > standard nvidia driver tools (such as `nvidia-smi`) without failing > with abnormal errors when no GPUs have been granted to it. As such, > these tools will now report that no GPUs are installed instead of > failing abnormally. > > NOTE: Once we allow GPUs to be granted to containers with filesystem > isolation turned on, other criteria will be used to determine when / > if to grant access to these control devices. > > > Diffs > - > > src/slave/containerizer/mesos/isolators/gpu/isolator.cpp > d7557a0c338e8c0e51461b2326600c03f89c2e8b > > Diff: https://reviews.apache.org/r/48376/diff/ > > > Testing > --- > > GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests > > > Thanks, > > Kevin Klues > >
Re: Review Request 48376: Changed semantics for granting access to /dev/nvidiactl, etc.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48376/ --- (Updated June 11, 2016, 3:05 a.m.) Review request for mesos and Benjamin Mahler. Changes --- Rebased for https://reviews.apache.org/r/48578 Bugs: MESOS- https://issues.apache.org/jira/browse/MESOS- Repository: mesos Description --- Previously, access to `/dev/nvidiactl` and `/dev/nvidia-uvm` was only granted to / revoked from a container as GPUs were added and removed from them. On some level, this makes sense because most jobs don't need access to these devices unless they are also using a GPU. However, there are cases when access to these files is appropriate, even when not making use of a GPU. Running `nvidia-smi` to control the global state of the underlying nvidia driver, for example. This commit adds `/dev/nvidiactl` and `/dev/nvidia-uvm` to the default whitelist of devices to include in every container when the `gpu/nvidia` isolator is enabled. This allows a container to run standard nvidia driver tools (such as `nvidia-smi`) without failing with abnormal errors when no GPUs have been granted to it. As such, these tools will now report that no GPUs are installed instead of failing abnormally. NOTE: Once we allow GPUs to be granted to containers with filesystem isolation turned on, other criteria will be used to determine when / if to grant access to these control devices. Diffs (updated) - src/slave/containerizer/mesos/isolators/gpu/isolator.cpp d7557a0c338e8c0e51461b2326600c03f89c2e8b Diff: https://reviews.apache.org/r/48376/diff/ Testing --- GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests Thanks, Kevin Klues
Re: Review Request 48376: Changed semantics for granting access to /dev/nvidiactl, etc.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48376/#review136561 --- Bad patch! Reviews applied: [48376, 48375, 48374, 48373, 48372, 48371, 48370, 48369, 48368, 48367, 48366, 48365, 48364, 48363] Failed command: ./support/apply-review.sh -n -r 48365 Error: 2016-06-07 23:14:22 URL:https://reviews.apache.org/r/48365/diff/raw/ [16129/16129] -> "48365.patch" [1] error: missing binary patch data for '3rdparty/nvml-352.79.tar.gz' error: binary patch does not apply to '3rdparty/nvml-352.79.tar.gz' error: 3rdparty/nvml-352.79.tar.gz: patch does not apply Full log: https://builds.apache.org/job/mesos-reviewbot/13608/console - Mesos ReviewBot On June 7, 2016, 10:48 p.m., Kevin Klues wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48376/ > --- > > (Updated June 7, 2016, 10:48 p.m.) > > > Review request for mesos and Benjamin Mahler. > > > Bugs: MESOS- > https://issues.apache.org/jira/browse/MESOS- > > > Repository: mesos > > > Description > --- > > Previously, access to `/dev/nvidiactl` and `/dev/nvidia-uvm` was > only granted to / revoked from a container as GPUs were added and > removed from them. On some level, this makes sense because most jobs > don't need access to these devices unless they are also using a GPU. > However, there are cases when access to these files is appropriate, > even when not making use of a GPU. Running `nvidia-smi` to control the > global state of the underlying nvidia driver, for example. > > This commit adds `/dev/nvidiactl` and `/dev/nvidia-uvm` to the default > whitelist of devices to include in every container when the > `gpu/nvidia` isolator is enabled. This allows a container to run > standard nvidia driver tools (such as `nvidia-smi`) without failing > with abnormal errors when no GPUs have been granted to it. As such, > these tools will now report that no GPUs are installed instead of > failing abnormally. > > NOTE: Once we allow GPUs to be granted to containers with filesystem > isolation turned on, other criteria will be used to determine when / > if to grant access to these control devices. > > > Diffs > - > > src/slave/containerizer/mesos/isolators/gpu/isolator.cpp > 7849e518af448d9557ca4d6de4ccaba8bc572992 > > Diff: https://reviews.apache.org/r/48376/diff/ > > > Testing > --- > > GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests > > > Thanks, > > Kevin Klues > >
Review Request 48376: Changed semantics for granting access to /dev/nvidiactl, etc.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48376/ --- Review request for mesos and Benjamin Mahler. Bugs: MESOS- https://issues.apache.org/jira/browse/MESOS- Repository: mesos Description --- Previously, access to `/dev/nvidiactl` and `/dev/nvidia-uvm` was only granted to / revoked from a container as GPUs were added and removed from them. On some level, this makes sense because most jobs don't need access to these devices unless they are also using a GPU. However, there are cases when access to these files is appropriate, even when not making use of a GPU. Running `nvidia-smi` to control the global state of the underlying nvidia driver, for example. This commit adds `/dev/nvidiactl` and `/dev/nvidia-uvm` to the default whitelist of devices to include in every container when the `gpu/nvidia` isolator is enabled. This allows a container to run standard nvidia driver tools (such as `nvidia-smi`) without failing with abnormal errors when no GPUs have been granted to it. As such, these tools will now report that no GPUs are installed instead of failing abnormally. NOTE: Once we allow GPUs to be granted to containers with filesystem isolation turned on, other criteria will be used to determine when / if to grant access to these control devices. Diffs - src/slave/containerizer/mesos/isolators/gpu/isolator.cpp 7849e518af448d9557ca4d6de4ccaba8bc572992 Diff: https://reviews.apache.org/r/48376/diff/ Testing --- GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests Thanks, Kevin Klues