Re: [VirtualGL-Users] rootless 3D X server (Xorg)?

'DRC' via VirtualGL User Discussion/Support Wed, 21 Sep 2022 14:40:33 -0700

Follow-up questions:

- Are your 3D applications purely non-interactive? That is, is itimportant for users to be able to interact with or see the output of theapplications in real time? If the answer is "never", then thatsimplifies the solution.

- Do these applications use GLX? EGL? Vulkan? Do they have X11 GUIsor need to use X11 for anything other than accessing the GPU?

If the applications are sometimes or always interactive, have X11 GUIs,and use GLX or EGL to access a GPU, then ideally you would use TurboVNCwith VirtualGL and the VirtualGL EGL back end. The way that would work is:


1. The user submits a batch job.

2. The job scheduler picks an execution node and a GPU on that node.

3. The job scheduler starts a new TurboVNC session on the executionnode. (Note that some job schedulers require the -fg switch to bepassed to /opt/TurboVNC/bin/vncserver in order to prevent TurboVNC fromimmediately backgrounding itself.)

4. The job scheduler temporarily changes the permissions and ownershipfor the devices (/dev/dri/card*, /dev/dri/render*, /dev/nvidia*)corresponding to the chosen GPU so that only the submitting user canaccess the GPU.

5. The job scheduler executes the 3D application with DISPLAY pointed tothe newly-created TurboVNC session and VGL_DISPLAY pointed to the chosenGPU's DRI device.


There are multiple ways in which TurboVNC sessions can be managed:

- Some sites use custom web portals that create a new TurboVNC sessionthrough a job scheduler; populate a VNC "connection info" file with theTurboVNC session's hostname, display number, and one-time password; anddownload the connection info file to the user's browser, where it can beopened with the TurboVNC Viewer. Re-connecting to the TurboVNC sessioninvolves much the same process, except that the job scheduler simplygenerates a new OTP for the existing session rather than starting a newsession.

- Some sites do basically the same thing without the web portal. In thatcase, the job scheduler prints the hostname and display number of thenewly-created TurboVNC session, and users are required to enter thatinformation into the TurboVNC Viewer manually and authenticate with theTurboVNC session using an authentication mechanism of the SysAdmin'schoosing. (TurboVNC supports static VNC passwords, Unix logincredentials or any other PAM-based authentication mechanism, one-timepasswords, time-based one-time passwords, X.509 certificates, and SSH,and SysAdmins can force a particular authentication and encryptionmechanism to be used on a system-wide basis.)

- If the users have direct SSH access to the execution nodes, then theycould also use the TurboVNC Session Manager, which handlesauthentication, encryption, and session management through SSH. (Inthat case, a user would only need to know the hostname of the executionnode on which their session is running.)


Potential wrinkles:

- The VirtualGL EGL back end generally works fine with straightforwardOpenGL applications, but there are a couple of esoteric applicationbehaviors (generally related to complex X11/OpenGL interactions) thatstill trip it up. You would need to test your applications and makesure that they all work properly with the EGL back end before declaringthat a 3D X server will never be necessary.

-If you are dealing with multi-GPU applications that expect to be ableto directly connect to separate GPU-attached X servers/screens in orderto access GPUs for the secondary rendering processes (e.g. ParaView"back in the days" before it supported EGL), then that complicatesthings. It should still be possible to use VirtualGL as a GLX-to-EGLtranslator in that case. It would just require special values of theVGL_DISPLAY and VGL_READBACK environment variables to be set for eachrendering process.

If the 3D applications are purely non-interactive, then you wouldn'tnecessarily need VirtualGL. VirtualGL is basically only useful fordisplaying to a remote system, because the two most common Un*x remotedisplay use cases are: (1) remote X11 (client-side physical X display),in which case you need VirtualGL in order to avoid sending OpenGLprimitives and data over the network, and (2) an X proxy (server-sidevirtual X display), in which case you need VirtualGL because X proxieslack GPU acceleration. You generally only need VGL if a 3D applicationis displaying something that a user needs to see or interact with inreal time. However, you could still use VirtualGL as a GLX-to-EGLtranslator if your non-interactive 3D applications use GLX to access aGPU. If the non-interactive 3D application needs an X server for somepurpose, such as creating a dummy window or a Pixmap, then you couldstart an Xvfb instance instead of TurboVNC, since the user would neverneed to see or interact with the application's output in real time.

tl;dr: I don't actually know how to start independent 3D X servers usinga job scheduler, and I'm not sure if starting GPU-attached Xorginstances under non-root accounts is even possible. (Someone elseplease correct me if I'm wrong.) Sites that use job schedulers and needto use the VirtualGL GLX back end will typically run a full-time 3D Xserver with a dedicated screen for each GPU. In that case, everything Isaid above applies, except that you would point VGL_DISPLAY to the GPU'sscreen rather than its DRI device. The full-time 3D X server shouldn'tuse any GPU compute resources when it is idle, but it will use some GPUmemory (not a lot, though-- like 32-64 MB if the GPU is configured forheadless operation.) However, the security situation is less palatable,since nothing would technically prevent another user from pointingVGL_DISPLAY to a screen attached to a GPU that has been allocated foranother user. I really think that the EGL back end is your best bet, ifyou can make it work.


DRC

On 9/21/22 1:37 PM, Doug O'Neal wrote:

The cluster has nodes containing 3-8 nvidia GPUs each with slurm asthe scheduler. The gpus are used mainly for AI and image processing.Display to a remote system is a secondary use. requirements will include


  * Only the user submitting the batch job has access to the gpu and
    the user has access only to the gpu(s) allocated through the batch
    system.
  * Ideal situation is for Xorg or an equivalent daemon to be started
    when the batch job starts and is killed when the job exits. Daemon
    should run as the user, possibly with /dev/nvidia? owned by the
    user. A chown can be included in the slurm prolog script.
  * If Xorg has to be running continuously, it should not take
    resources (gpu system time or memory) away from the non-display
    jobs when they have the gpu allocated. Do we need one daemon per
    gpu and how to we restrict access based on slurm resource requests?
  * More minor but still a problem. Running Xorg headless still blocks
    access to the virtual consoles using HPE servers and iLO to connect


--
You received this message because you are subscribed to the Google Groups "VirtualGL 
User Discussion/Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/virtualgl-users/fbd6771f-df33-7960-47ab-468086fcf144%40virtualgl.org.

Re: [VirtualGL-Users] rootless 3D X server (Xorg)?

Reply via email to