Re: [VirtualGL-Users] VGL using swrast instead of i965

Shak Fri, 17 Apr 2020 15:27:22 -0700

It might be useful to try a known working configuration of Distro+DE using 
VGL and TVNC with this HW.


Could you advise on the most straightforward recipe starting from scratch 
that I can use to have the most minimal and foolproof system up and running?

On Friday, 17 April 2020 23:21:39 UTC+1, Shak wrote:
>
> So despite my Plasma desktop being broken, I am able to use it "blind". I 
> managed to run both glxspheres64 (120fps) and glmark2 (200) and gputest 
> (300fps), and they all report that they use the IGD (if not with the host 
> performance, apart from glxsphere64). So I guess even if I was able to run 
> the DE via VGL it wouldn't make a difference to my results (which was 
> expected).
>
> Here are the results from glreadtest:
>
> *==== glreadtest ====*
> GLreadtest v2.6.3 (Build 20200214)
>
> /usr/bin/glreadtest -h for advanced usage.
> Rendering to Pbuffer (size = 701 x 701 pixels)
> Using 1-byte row alignment
>
> >>>>>>>>>>  PIXEL FORMAT:  RGB  <<<<<<<<<<
> glDrawPixels():   107.6 Mpixels/sec
> glReadPixels():   148.3 Mpixels/sec (min = 115.4, max = 156.2, sdev = 
> 3.569)
> glReadPixels() accounted for 100.00% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  RGBA  <<<<<<<<<<
> glDrawPixels():   124.6 Mpixels/sec
> glReadPixels():   181.2 Mpixels/sec (min = 160.9, max = 197.7, sdev = 
> 5.405)
> glReadPixels() accounted for 100.00% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  BGR  <<<<<<<<<<
> glDrawPixels():   107.9 Mpixels/sec
> glReadPixels():   149.0 Mpixels/sec (min = 135.5, max = 156.1, sdev = 
> 2.881)
> glReadPixels() accounted for 100.00% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  BGRA  <<<<<<<<<<
> glDrawPixels():   104.7 Mpixels/sec
> glReadPixels():   142.6 Mpixels/sec (min = 125.9, max = 150.4, sdev = 
> 3.375)
> glReadPixels() accounted for 100.00% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  ABGR  <<<<<<<<<<
> glDrawPixels():   105.0 Mpixels/sec
> glReadPixels():   143.7 Mpixels/sec (min = 130.7, max = 154.0, sdev = 
> 2.879)
> glReadPixels() accounted for 100.00% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  ARGB  <<<<<<<<<<
> glDrawPixels():   118.6 Mpixels/sec
> glReadPixels():   143.7 Mpixels/sec (min = 112.5, max = 151.9, sdev = 
> 4.444)
> glReadPixels() accounted for 100.00% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  RED  <<<<<<<<<<
> glDrawPixels():   110.2 Mpixels/sec
> glReadPixels():   157.9 Mpixels/sec (min = 122.3, max = 187.8, sdev = 
> 6.647)
> glReadPixels() accounted for 100.00% of total readback time
>
> FB Config = 0x6a
>
> *==== glreadtest -pbo ====*
>
> GLreadtest v2.6.3 (Build 20200214)
> Using PBOs for readback
> Rendering to Pbuffer (size = 701 x 701 pixels)
> Using 1-byte row alignment
>
> >>>>>>>>>>  PIXEL FORMAT:  RGB  <<<<<<<<<<
> glDrawPixels():   112.2 Mpixels/sec
> glReadPixels():   172.4 Mpixels/sec (min = 113.4, max = 208.4, sdev = 
> 20.38)
> glReadPixels() accounted for 96.69% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  RGBA  <<<<<<<<<<
> glDrawPixels():   124.1 Mpixels/sec
> glReadPixels():   241.5 Mpixels/sec (min = 157.6, max = 271.7, sdev = 
> 14.43)
> glReadPixels() accounted for 0.6267% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  BGR  <<<<<<<<<<
> glDrawPixels():   107.6 Mpixels/sec
> glReadPixels():   143.5 Mpixels/sec (min = 114.4, max = 151.3, sdev = 
> 3.703)
> glReadPixels() accounted for 97.27% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  BGRA  <<<<<<<<<<
> glDrawPixels():   104.1 Mpixels/sec
> glReadPixels():   247.1 Mpixels/sec (min = 197.5, max = 279.2, sdev = 
> 13.49)
> glReadPixels() accounted for 0.6108% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  ABGR  <<<<<<<<<<
> glDrawPixels():   104.9 Mpixels/sec
> glReadPixels():   138.8 Mpixels/sec (min = 122.6, max = 145.3, sdev = 
> 3.135)
> glReadPixels() accounted for 96.54% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  ARGB  <<<<<<<<<<
> glDrawPixels():   120.9 Mpixels/sec
> glReadPixels():   138.8 Mpixels/sec (min = 114.0, max = 147.7, sdev = 
> 3.362)
> glReadPixels() accounted for 96.49% of total readback time
>
> >>>>>>>>>>  PIXEL FORMAT:  RED  <<<<<<<<<<
> glDrawPixels():   111.9 Mpixels/sec
> glReadPixels():   486.6 Mpixels/sec (min = 236.9, max = 638.9, sdev = 
> 85.43)
> glReadPixels() accounted for 1.265% of total readback time
>
> FB Config = 0x6a
>
>
>
> On Friday, 17 April 2020 22:59:40 UTC+1, DRC wrote:
>>
>> I honestly have no idea.  I am successfully able to use your ~/gnome 
>> script on my CentOS 7 and 8 machines (one has an nVidia GPU, the other 
>> AMD), as long as I make the script executable.  The WM launches using 
>> VirtualGL, as expected.
>>
>> As far as performance, it occurred to me that the Intel GPU might have 
>> slow pixel readback.  Try running '/opt/VirtualGL/bin/glreadtest' and 
>> '/opt/VirtualGL/bin/glreadtest -pbo' on the local display and post the 
>> results.  If one particular readback mode is slow but others are fast, then 
>> we can work around that by using environment variables to tell VirtualGL 
>> which mode to use.
>>
>> DRC
>> On 4/17/20 4:48 PM, Shak wrote:
>>
>> *echo $LD_PRELOAD* returns empty, so something is up. But my main 
>> measure of failure was that glxspheres64 (and glmark2) say that they render 
>> with llvmpipe. Given that I am using the default xstartup.turboscript, am I 
>> supposed to do something other than run *vncserver -wm ~/gnome -vgl* (I 
>> use a script as I can't figure out how else to pass "dbus-launch 
>> gnome-session" 
>> to -wm)?  
>>
>> Some more benchmarks. I'm quite new to OpenGL so these were just found 
>> after some web searches. If there's obvious and useful ones I should do 
>> please let me know.
>>
>> gputest on host: 2600fps
>> gputest via VNC: 370fps
>> vglrun -sp gputest via VNC: 400fps
>>
>> gfxbench (car chase) on host: 44fps
>> gfxbench (car chase) via VNC: won't run on llvmpipe
>> vglrun gfxbench (car chase) via VNC: 28fps
>>
>> On Friday, 17 April 2020 21:37:08 UTC+1, DRC wrote: 
>>>
>>> Bear in mind that passing -wm and -vgl to the vncserver script does 
>>> nothing but set environment variables (TVNC_WM and TVNC_VGL) that are 
>>> picked up by the default xstartup.turbovnc script, so make sure you are 
>>> using the default xstartup.turbovnc script.  It's easy to verify whether 
>>> the window manager is using VirtualGL.  Just open a terminal in the 
>>> TurboVNC session and echo the value of $LD_PRELOAD.  It should contain 
>>> something like "libdlfaker.so:libvglfaker.so" if VirtualGL is active, and 
>>> you should be able to run OpenGL applications in the session without 
>>> vglrun, and those applications should show that they are using the Intel 
>>> OpenGL renderer.
>>>
>>> As far as the performance, you haven't mentioned any other benchmarks 
>>> you have tested, other than glmark2.  I've explained why that benchmark may 
>>> be demonstrating lackluster performance.  If you have other data points, 
>>> then please share them.
>>> On 4/17/20 2:34 PM, Shak wrote:
>>>
>>> I ran the commands you suggested (I went with -p 1m) and am still seeing 
>>> a big difference. I just find it strange to see it clearly working with 
>>> glxspheres64, but not much else. 
>>>
>>> $ *glxspheres64 -p 1000000*
>>> Polygons in scene: 999424 (61 spheres * 16384 polys/spheres)
>>> GLX FB config ID of window: 0xfe (8/8/8/0)
>>> Visual ID of window: 0x2bf
>>> Context is Direct
>>> OpenGL Renderer: llvmpipe (LLVM 9.0.1, 256 bits)
>>> 3.292760 frames/sec - 2.370366 Mpixels/sec
>>> 3.317006 frames/sec - 2.387820 Mpixels/sec
>>> $ *vglrun -sp glxspheres64 -p 1000000*
>>> Polygons in scene: 999424 (61 spheres * 16384 polys/spheres)
>>> GLX FB config ID of window: 0x6b (8/8/8/0)
>>> Visual ID of window: 0x288
>>> Context is Direct
>>> OpenGL Renderer: Mesa DRI Intel(R) HD Graphics P4600/P4700 (HSW GT2)
>>> 62.859812 frames/sec - 45.251019 Mpixels/sec
>>> 59.975806 frames/sec - 43.174903 Mpixels/sec
>>>
>>> BTW, GNOME is now working (where I ran the above). I'm trying to run the 
>>> whole desktop in VGL, but *vncserver -wm ~/gnome -vgl* doesn't seem to 
>>> do anything differently than it does without -vgl. Again, my gnome script 
>>> is:
>>>
>>> #!/bin/sh
>>> dbus-launch gnome-session
>>>
>>> That said, the desktop isn't broken now so that's an improvement on KDE. 
>>> But how can I run the whole of GNOME under VGL?
>>>
>>> I think if I can get the desktop running in VGL and still not see the 
>>> performance in apps that I do locally (apart from in glxspheres!) I will 
>>> take that as the most I can do with my system over VNC (unless you find it 
>>> helpful for me to debug further).
>>>
>>> Thanks,
>>>
>>>
>>> On Friday, 17 April 2020 19:04:48 UTC+1, DRC wrote: 
>>>>
>>>> On 4/17/20 10:36 AM, Shak wrote:
>>>>
>>>> I ran glmark on the host display normally and then with software 
>>>> rendering. I've attached the results at the end of this message. I've 
>>>> attached this for completion rather than to contradict your hunch, but 
>>>> they 
>>>> do tie up with the numbers I see via VGL so I don't think this is a 
>>>> CPU/VNC 
>>>> issue.
>>>>
>>>> Hmmm...  Well, you definitely are seeing a much greater speedup with 
>>>> glmark2 absent VirtualGL, so I can only guess that the benchmark is 
>>>> fine-grained enough that it's being affected by VGL's per-frame overhead.  
>>>> A more realistic way to compare the two drivers would be using '[vglrun 
>>>> -sp] /opt/VirtualGL/bin/glxspheres -p {n}', where {n} is a fairly high 
>>>> number of polygons (at least 100,000.) 
>>>>
>>>>
>>>> I've tried repeating my experiments using gnome, in case the issue is 
>>>> with KDE. However I get the following when trying to run vglrun:
>>>>
>>>> $ *vglrun glxspheres64*
>>>> /usr/bin/vglrun: line 191: hostname: command not found
>>>> [VGL] NOTICE: Automatically setting VGL_CLIENT environment variable to
>>>> [VGL]    10.10.7.1, the IP address of your SSH client.
>>>> Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
>>>> libGL error: failed to authenticate magic 1
>>>> libGL error: failed to load driver: i965
>>>> GLX FB config ID of window: 0x6b (8/8/8/0)
>>>> Visual ID of window: 0x21
>>>> Context is Direct
>>>> OpenGL Renderer: llvmpipe (LLVM 9.0.1, 256 bits)
>>>> 17.228616 frames/sec - 17.859872 Mpixels/sec
>>>> 16.580449 frames/sec - 17.187957 Mpixels/sec
>>>>
>>>> You need to install whatever package provides /usr/bin/hostname for 
>>>> your Linux distribution.  That will eliminate the vglrun error, although 
>>>> it's probably unrelated to this problem. (Because of the error, vglrun is 
>>>> falsely detecting an X11-forward SSH environment and setting VGL_CLIENT, 
>>>> which would normally be used for the VGL Transport.  However, since 
>>>> VirtualGL auto-detects an X11 proxy environment and enables the X11 
>>>> Transport, the value of VGL_CLIENT should be ignored in this case.)
>>>>
>>>> I honestly have no clue how to proceed.  I haven't observed these 
>>>> problems in any of the distributions I officially support, and I have no 
>>>> way to test Arch.
>>>>
>>>> I'm not sure what to make of these. I am using *vncserver -wm ~/gnome*, 
>>>> where gnome is the following script.
>>>>
>>>> #!/bin/sh
>>>> dbus-launch gnome-session
>>>>
>>>> I feel that I am close but still a way off. 
>>>>
>>>> FWIW I have previously tried using nomachine which is able to give me 
>>>> the perceived GL acceleration by "mirroring" my host display, but that 
>>>> just 
>>>> feels like the wrong way to achieve this (not least because it requires a 
>>>> monitor attached to use).
>>>>
>>>> Thanks,
>>>>
>>>> ==== RENDER TESTS ====
>>>>
>>>> $ *glmark2*
>>>> =======================================================
>>>>     glmark2 2017.07
>>>> =======================================================
>>>>     OpenGL Information
>>>>     GL_VENDOR:     Intel Open Source Technology Center
>>>>     GL_RENDERER:   Mesa DRI Intel(R) HD Graphics P4600/P4700 (HSW GT2)
>>>>     GL_VERSION:    3.0 Mesa 20.0.4
>>>> =======================================================
>>>> [build] use-vbo=false: FPS: 2493 FrameTime: 0.401 ms
>>>> =======================================================
>>>>                                   glmark2 Score: 2493
>>>> =======================================================
>>>>
>>>> $ *LIBGL_ALWAYS_SOFTWARE=1 glmark2*
>>>> ** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
>>>> ** Failed to set swap interval. Results may be bounded above by refresh 
>>>> rate.
>>>> =======================================================
>>>>     glmark2 2017.07
>>>> =======================================================
>>>>     OpenGL Information
>>>>     GL_VENDOR:     VMware, Inc.
>>>>     GL_RENDERER:   llvmpipe (LLVM 9.0.1, 256 bits)
>>>>     GL_VERSION:    3.1 Mesa 20.0.4
>>>> =======================================================
>>>> ** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
>>>> ** Failed to set swap interval. Results may be bounded above by refresh 
>>>> rate.
>>>> [build] use-vbo=false: FPS: 420 FrameTime: 2.381 ms
>>>> =======================================================
>>>>                                   glmark2 Score: 420
>>>> =======================================================
>>>>
>>>>
>>>> On Thursday, 16 April 2020 23:21:59 UTC+1, DRC wrote: 
>>>>>
>>>>> On 4/16/20 3:19 PM, Shak wrote:
>>>>>
>>>>> Thank you for the quick tips. I have posted some results at the end of 
>>>>> this post, but they seem inconsistent. glxspheres64 shows the correct 
>>>>> renderer respectively and the performance shows the 6x results I was 
>>>>> expecting. However I do not see the same gains in glmark2, even though it 
>>>>> also reports the correct renderer in each case. Again, I see a glmark of 
>>>>> 2000+ when running it in display :0.
>>>>>
>>>>> I don't know much about glmark2, but as with any benchmark, Amdahl's 
>>>>> Law applies.  That means that the total speedup from any enhancement 
>>>>> (such 
>>>>> as a GPU) is limited by the percentage of clock time during which that 
>>>>> enhancement is used.  Not all OpenGL workloads are GPU-bound in terms of 
>>>>> performance.  If the geometry and window size are both really small, then 
>>>>> the performance could very well be CPU-bound.  That's why, for instance, 
>>>>> GLXgears is a poor OpenGL benchmark.  Real-world applications these days 
>>>>> assume the presence of a GPU, so they're going to have no qualms about 
>>>>> trying to render geometries with hundreds of thousands or even millions 
>>>>> of 
>>>>> polygons.  When you try to do that with software OpenGL, you'll see a big 
>>>>> difference vs. GPU acceleration-- a difference that won't necessarily 
>>>>> show 
>>>>> up with tiny geometries. 
>>>>>
>>>>> You can confirm that that's the case by running glmark2 on your local 
>>>>> display without VirtualGL and forcing the use of the swrast driver.  I 
>>>>> suspect that the difference between swrast and i965 won't be very great 
>>>>> in 
>>>>> that scenario, either.  (I should also mention that Intel GPUs aren't the 
>>>>> fastest in the world, so you're never going to see as much of a speedup-- 
>>>>> nor as large of a speedup in as many cases-- as you would see with AMD or 
>>>>> nVidia.)
>>>>>
>>>>> The other thing is, if the benchmark is attempting to measure 
>>>>> unrealistic frame rates-- like hundreds or thousands of frames per 
>>>>> second-- 
>>>>> then there is a small amount of per-frame overhead introduced by 
>>>>> VirtualGL 
>>>>> that may be limiting that frame rate.  But the reality is that human 
>>>>> vision 
>>>>> can't usually detect more than 60 fps anyhow, so the difference between, 
>>>>> say, 200 fps and 400 fps is not going to matter to an application user.  
>>>>> At 
>>>>> more realistic frame rates, VGL's overhead won't be noticeable.
>>>>>
>>>>> Performance measurement in a VirtualGL environment is more complicated 
>>>>> than performance measurement in a local display environment, which is why 
>>>>> there's a whole section of the VirtualGL User's Guide dedicated to it.  
>>>>> Basically, since VGL introduces a small amount of per-frame overhead but 
>>>>> no 
>>>>> per-vertex overhead, at realistic frame rates and with modern server and 
>>>>> client hardware, it will not appear any slower than a local display.  
>>>>> However, some synthetic benchmarks may record slower performance due to 
>>>>> the 
>>>>> aforementioned overhead.
>>>>>
>>>>>
>>>>> In the meantime I have been trying to get the DE as a whole to run 
>>>>> under acceleration. I record my findings here as a possible clue to my 
>>>>> VGL 
>>>>> issues above. In my .vnc/xstartup.turbovnc I use the following command:
>>>>>
>>>>> #normal start - works with llvmpipe and vglrun
>>>>> #exec startplasma-x11
>>>>>
>>>>> #VGL start
>>>>> exec vglrun +wm startplasma-x11
>>>>>
>>>>> And I also start tvnc with:
>>>>>
>>>>> $vncserver -3dwm
>>>>>
>>>>> I'm not sure if vglrun, +wm or -3dwm are redundant or working against 
>>>>> each other, but I've also tried various combinations to no avail.
>>>>>
>>>>> Just use the default xstartup.turbovnc script ('rm 
>>>>> ~/.vnc/xstartup.turbovnc' and re-run /opt/TurboVNC/bin/vncserver to 
>>>>> create 
>>>>> it) and start TurboVNC with '-wm startplasma-x11 -vgl'.
>>>>>
>>>>> * -3dwm is deprecated.  Use -vgl instead.  -3dwm/-vgl (or setting 
>>>>> '$useVGL = 1;' in /etc/turbovncserver.conf or ~/.vnc/turbovncserver.conf) 
>>>>> simply instructs xstartup.turbovnc to run the window manager startup 
>>>>> script 
>>>>> using 'vglrun +wm'.
>>>>>
>>>>> * Passing -wm to /opt/TurboVNC/bin/vncserver (or setting '$wm = 
>>>>> {script};' in turbovncserver.conf) instructs xstartup.turbovnc to execute 
>>>>> the specified window manager startup script rather than 
>>>>> /etc/X11/xinit/xinitrc.
>>>>>
>>>>> * +wm is a feature of VirtualGL, not TurboVNC.  Normally, if VirtualGL 
>>>>> detects that an OpenGL application is not monitoring StructureNotify 
>>>>> events, VGL will monitor those events on behalf of the application (which 
>>>>> allows VGL to be notified when the window changes size, thus allowing VGL 
>>>>> to change the size of the corresponding Pbuffer.)  This is, however, 
>>>>> unnecessary with window managers and interferes with some of them 
>>>>> (compiz, 
>>>>> specifically), so +wm disables that behavior in VirtualGL.  It's also a 
>>>>> placeholder in case future issues are discovered that are specific to 
>>>>> compositing window managers (+wm could easily be extended to handle those 
>>>>> issues as well.)
>>>>>
>>>>> Interestingly I had to update the vglrun script to have the full paths 
>>>>> to /usr/lib/libdlfaker.so and the others otherwise I see the following in 
>>>>> the TVNC logs:
>>>>>
>>>>> ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be 
>>>>> preloaded (cannot open shared object file): ignored.
>>>>> ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be 
>>>>> preloaded (cannot open shared object file): ignored.
>>>>>
>>>>> That said, my desktop is still broken even when these errors disappear.
>>>>>
>>>>> Could my various issues be to do with KDE? 
>>>>>
>>>>> The LD_PRELOAD issues can be fixed as described here:
>>>>>
>>>>> https://cdn.rawgit.com/VirtualGL/virtualgl/2.6.3/doc/index.html#hd0012
>>>>>
>>>>> All 
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"VirtualGL User Discussion/Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/virtualgl-users/ea90fd9d-ec3e-4d42-a048-342939d96614%40googlegroups.com.

Re: [VirtualGL-Users] VGL using swrast instead of i965

Reply via email to