Re: [Mesa-dev] Very low framerate when recording desktop content in Weston using mesa git on Radeon 5770 (glReadPixels slow path)
On 03/26/2013 10:51 PM Matt Turner wrote: On Tue, Mar 26, 2013 at 2:44 PM, Bengt Richterb...@oz.net wrote: uint32_t component_delta2(uint32_t next, uint32_t prev) { return next0xff00ff)-(prev0xff00ff)+0x100)0xff00ff)+ (((next0xff00)-(prev0xff00))0xff00)); } Does removing all the spaces make it faster? ;) LOL .. actually I didn't put them in in the first place ;-) But inlining might make the calling loop faster. Hm, easy to try now ... inlining cut the time in almost in half again. I assigned to a volatile so the loop wouldn't get optimized away. I just have a loop in the test kludge like else if (strcmp(argv[1],v2)==0){ for (i=0;i256*256*256;++i){ antiO2 = component_delta2(i, i^0xff); } } For the above with inlined component_delta2 I get 55ms vs 95ms not inlined, vs orig not inlined 167ms, FWIW. My optimization reflex just got triggered, I didn't look at the full post context to see if it might really be useful or not. Regards, Bengt Richter ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Very low framerate when recording desktop content in Weston using mesa git on Radeon 5770 (glReadPixels slow path)
On Tue, 26 Mar 2013 03:30:58 +0100 Rune Kjær Svendsen runesv...@gmail.com wrote: Marek, do you have an idea on where the currency bottleneck is? I just did a profiling with sysprof, zooming in on the desktop in Weston and moving the mouse wildly around, so that the buffer is completely changed for every frame. I got around 5 fps, which isn't *that* much, but still an order of magnitude better than without your patches. sysprof says there is 100% CPU usage, but unlike the previous 0.5-FPS recording, it's not in a single function, but spread out over several functions: 35% weston_recorder_frame_notify 11% __memcpy_ssse3 4.5% clear_page_c 4.3% output_run Although I'm not completely sure I'm reading the sysprof output right. weston_recorder_frame_notify, for example, has 35% CPU usage, but none of its child functions has any significant CPU usage. I presume the CPU usage in that function is from calling glReadPixels, although that's not apparent from sysprof: weston_recorder_frame_notify 39.15% 39.15% - - kernel - - 0.00% 0.01% ret_from_intr 0.00% 0.01% __irqentry_text_start 0.00% Well, if you look at weston_recorder_frame_notify function, it has a naive loop over each single pixel it processes. component_delta() may get inlined, and output_run() you saw in the profile. I think it's possible it actually is weston_recorder_frame_notify eating the CPU. Can you get more precise profiling, like hits per source line or instruction? Thanks, pq ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Very low framerate when recording desktop content in Weston using mesa git on Radeon 5770 (glReadPixels slow path)
On Tue, Mar 26, 2013 at 2:44 PM, Bengt Richter b...@oz.net wrote: uint32_t component_delta2(uint32_t next, uint32_t prev) { return next0xff00ff)-(prev0xff00ff)+0x100)0xff00ff)+ (((next0xff00)-(prev0xff00))0xff00)); } Does removing all the spaces make it faster? ;) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Very low framerate when recording desktop content in Weston using mesa git on Radeon 5770 (glReadPixels slow path)
I recommend using OpenMP for this kind of pixel processing, specifically the parallel for. It's pretty easy to use and you could do wonders with it, i.e. taking advantage of all CPU cores on *any* system. You could also offload the whole thing to another thread and continue there. Marek On Tue, Mar 26, 2013 at 6:26 PM, Rune Kjær Svendsen runesv...@gmail.com wrote: (I'm re-sending this message because the attachment was too large for mesa-dev, and because I want to add wayland-devel CC. The valgrind output can be found here: http://runeks.dk/files/callgrind.out.11362). Seems like you are right Pekka. I just ran weston through valgrind, and got some interesting results. I ran it like so: valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes weston Which allows me to get the time spent on a per-instruction level. Now, this is running inside a virtual machine, so it won't be the same as running it natively, but it agrees with the other benchmarks in the sense that it suggests it is the simple calculations inside screenshooter.c that take up most of the CPU time, not calls to outside functions (like it was before with the slow glReadPixels() path). The callgrind output can be found at the following URL: http://runeks.dk/files/callgrind.out.11362 Open it with KCachegrind, select the function weston_recorder_frame_notify(), and go to the Machine Code tab in the lower right corner to see the interesting stuff. According to callgrind, a total of 54.39% CPU time is used in the four lines 251, 252, 253 and 255 in screenshooter.c. That's the function component_delta(): dr = (next 16) - (prev 16); dg = (next 8) - (prev 8); db = (next 0) - (prev 0); return (dr 16) | (dg 8) | (db 0); Additionally, the lines 358, 359, 361, 362, 363 take up 25.9% CPU time. That is the innermost for-loop inside weston_recorder_frame_notify(). It's the following lines with the call to component_delta() not included (line 360) since we've already included the CPU usage of that: for (k = 0; k width; k++) { next = *s++; delta = component_delta(next, *d); *d++ = next; if (run == 0 || delta == prev) { run++; And then finally the call to output_run() on line 365 takes up 10.6% CPU time: p = output_run(p, prev, run); Inside output_run(), the lines that take up the most CPU time are: 232, 233, 234, 235, 239, 240. They take up 9.94% CPU time. So it's basically the whole while loop in output_run() except the line with the call to __builtin_clz() (line 238): while (run 0) { if (run = 0xe0) { *p++ = delta | ((run - 1) 24); break; } i = 24 - __builtin_clz(run); *p++ = delta | ((i + 0xe0) 24); run -= 1 (7 + i); } All of that adds up 90.89% CPU time spent inside screenshooter.c. /Rune On Tue, Mar 26, 2013 at 8:58 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Tue, 26 Mar 2013 03:30:58 +0100 Rune Kjær Svendsen runesv...@gmail.com wrote: Marek, do you have an idea on where the currency bottleneck is? I just did a profiling with sysprof, zooming in on the desktop in Weston and moving the mouse wildly around, so that the buffer is completely changed for every frame. I got around 5 fps, which isn't *that* much, but still an order of magnitude better than without your patches. sysprof says there is 100% CPU usage, but unlike the previous 0.5-FPS recording, it's not in a single function, but spread out over several functions: 35% weston_recorder_frame_notify 11% __memcpy_ssse3 4.5% clear_page_c 4.3% output_run Although I'm not completely sure I'm reading the sysprof output right. weston_recorder_frame_notify, for example, has 35% CPU usage, but none of its child functions has any significant CPU usage. I presume the CPU usage in that function is from calling glReadPixels, although that's not apparent from sysprof: weston_recorder_frame_notify 39.15% 39.15% - - kernel - - 0.00% 0.01% ret_from_intr 0.00% 0.01% __irqentry_text_start 0.00% Well, if you look at weston_recorder_frame_notify function, it has a naive loop over each single pixel it processes. component_delta() may get inlined, and output_run() you saw in the profile. I think it's possible it actually is weston_recorder_frame_notify eating the CPU. Can you get more precise profiling, like hits per source line or instruction? Thanks, pq Med venlig hilsen, Rune Kjær Svendsen Østerbrogade 111, 3. - 302 2100 København Ø Tlf.: 2835 0726 On Tue, Mar 26, 2013 at 8:58 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Tue, 26 Mar 2013 03:30:58 +0100 Rune Kjær Svendsen runesv...@gmail.com wrote: Marek, do you have an idea on where the currency bottleneck is? I just did a profiling with sysprof, zooming in on the desktop in Weston and moving the mouse wildly around, so
Re: [Mesa-dev] Very low framerate when recording desktop content in Weston using mesa git on Radeon 5770 (glReadPixels slow path)
Marek, do you have an idea on where the currency bottleneck is? I just did a profiling with sysprof, zooming in on the desktop in Weston and moving the mouse wildly around, so that the buffer is completely changed for every frame. I got around 5 fps, which isn't *that* much, but still an order of magnitude better than without your patches. sysprof says there is 100% CPU usage, but unlike the previous 0.5-FPS recording, it's not in a single function, but spread out over several functions: 35% weston_recorder_frame_notify 11% __memcpy_ssse3 4.5% clear_page_c 4.3% output_run Although I'm not completely sure I'm reading the sysprof output right. weston_recorder_frame_notify, for example, has 35% CPU usage, but none of its child functions has any significant CPU usage. I presume the CPU usage in that function is from calling glReadPixels, although that's not apparent from sysprof: weston_recorder_frame_notify 39.15% 39.15% - - kernel - - 0.00% 0.01% ret_from_intr 0.00% 0.01% __irqentry_text_start 0.00% 0.01% irq_exit 0.00% 0.01% do_softirq 0.00% 0.01% call_softirq 0.00% 0.01% __do_softirq0.00% 0.01% blk_done_softirq 0.00% 0.01% scsi_softirq_done 0.00% 0.01% scsi_finish_command 0.00% 0.01% scsi_io_completion 0.00% 0.01% blk_end_request 0.00% 0.01% blk_end_bidi_request0.00% 0.01% blk_update_bidi_request 0.00% 0.01% blk_update_request 0.00% 0.01% req_bio_endio.isra.46 0.00% 0.01% bio_endio 0.00% 0.01% end_swap_bio_write0.00% 0.01% end_page_writeback 0.00% 0.01% rotate_reclaimable_page 0.01% 0.01% Another possible bottleneck is simply disk access, although it doesn't seem to be relevant on my system (since I have 100% CPU usage). The 36-second recording I made was 1.3 GB in size, so that's around 36 MB/s. Med venlig hilsen, Rune Kjær Svendsen Østerbrogade 111, 3. - 302 2100 København Ø Tlf.: 2835 0726 On Mon, Mar 18, 2013 at 1:20 AM, Marek Olšák mar...@gmail.com wrote: Slowness is not usually a bug. I guess it can be optimized even more. It depends on where the bottleneck is now. Marek On Sun, Mar 17, 2013 at 10:14 PM, Rune Kjær Svendsen runesv...@gmail.com wrote: Thank you very much! This is much better. It's gone from 0.5-ish FPS when zooming in to around 10 FPS, depending on screen content. So I figure this isn't a bug? I assumed it was a bug, but is the case simply that an efficient glReadPixels path for radeon/gallium doesn't exist? The patch set sure helps in that regard, although it'd be really nice to get 30 FPS consistently, if at all possible. Thanks again. /Rune On Sun, Mar 17, 2013 at 6:46 PM, Andreas Boll andreas.boll@gmail.com wrote: 2013/3/17 Rune Kjær Svendsen runesv...@gmail.com: Hello list I'm having problems recording the desktop content using the Weston compositor's built-in recording function. When I start a recording and do something that changes a lot of screen content (like zooming in on the desktop, for example), I get around 0.5 FPS. Using sysprof, I can see that ~98% of CPU is used in the function unpack_XRGB(). krh has told me this is caused by glReadPixels going through a slowpath. I have a Radeon HD 5770 GPU and I'm using mesa git (I've tried the mesa version in the Ubuntu 12.10 repos, and the xorg-edgers PPA, same result). Does anyone know what the issue could be, or how to debug the problem further? This patch series [1] should help. You might want to try it. [1] http://lists.freedesktop.org/archives/mesa-dev/2013-March/036214.html Doing some debugging, it seems the call to ctx-Driver.ReadPixels() in _mesa_ReadnPixelsARB leads to _mesa_readpixels() being called in readpix.c. I'm attaching some output of gdb that will hopefully be useful. I'm also attaching the debug terminal output of running Weston with the DRM backend. Let me know if I can provide other useful information.
[Mesa-dev] Very low framerate when recording desktop content in Weston using mesa git on Radeon 5770 (glReadPixels slow path)
Hello list I'm having problems recording the desktop content using the Weston compositor's built-in recording function. When I start a recording and do something that changes a lot of screen content (like zooming in on the desktop, for example), I get around 0.5 FPS. Using sysprof, I can see that ~98% of CPU is used in the function unpack_XRGB(). krh has told me this is caused by glReadPixels going through a slowpath. I have a Radeon HD 5770 GPU and I'm using mesa git (I've tried the mesa version in the Ubuntu 12.10 repos, and the xorg-edgers PPA, same result). Does anyone know what the issue could be, or how to debug the problem further? Doing some debugging, it seems the call to ctx-Driver.ReadPixels() in _mesa_ReadnPixelsARB leads to _mesa_readpixels() being called in readpix.c. I'm attaching some output of gdb that will hopefully be useful. I'm also attaching the debug terminal output of running Weston with the DRM backend. Let me know if I can provide other useful information. weston-mesa.log Description: Binary data gdb-mesa.log Description: Binary data ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Very low framerate when recording desktop content in Weston using mesa git on Radeon 5770 (glReadPixels slow path)
2013/3/17 Rune Kjær Svendsen runesv...@gmail.com: Hello list I'm having problems recording the desktop content using the Weston compositor's built-in recording function. When I start a recording and do something that changes a lot of screen content (like zooming in on the desktop, for example), I get around 0.5 FPS. Using sysprof, I can see that ~98% of CPU is used in the function unpack_XRGB(). krh has told me this is caused by glReadPixels going through a slowpath. I have a Radeon HD 5770 GPU and I'm using mesa git (I've tried the mesa version in the Ubuntu 12.10 repos, and the xorg-edgers PPA, same result). Does anyone know what the issue could be, or how to debug the problem further? This patch series [1] should help. You might want to try it. [1] http://lists.freedesktop.org/archives/mesa-dev/2013-March/036214.html Doing some debugging, it seems the call to ctx-Driver.ReadPixels() in _mesa_ReadnPixelsARB leads to _mesa_readpixels() being called in readpix.c. I'm attaching some output of gdb that will hopefully be useful. I'm also attaching the debug terminal output of running Weston with the DRM backend. Let me know if I can provide other useful information. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Very low framerate when recording desktop content in Weston using mesa git on Radeon 5770 (glReadPixels slow path)
Thank you very much! This is much better. It's gone from 0.5-ish FPS when zooming in to around 10 FPS, depending on screen content. So I figure this isn't a bug? I assumed it was a bug, but is the case simply that an efficient glReadPixels path for radeon/gallium doesn't exist? The patch set sure helps in that regard, although it'd be really nice to get 30 FPS consistently, if at all possible. Thanks again. /Rune On Sun, Mar 17, 2013 at 6:46 PM, Andreas Boll andreas.boll@gmail.comwrote: 2013/3/17 Rune Kjær Svendsen runesv...@gmail.com: Hello list I'm having problems recording the desktop content using the Weston compositor's built-in recording function. When I start a recording and do something that changes a lot of screen content (like zooming in on the desktop, for example), I get around 0.5 FPS. Using sysprof, I can see that ~98% of CPU is used in the function unpack_XRGB(). krh has told me this is caused by glReadPixels going through a slowpath. I have a Radeon HD 5770 GPU and I'm using mesa git (I've tried the mesa version in the Ubuntu 12.10 repos, and the xorg-edgers PPA, same result). Does anyone know what the issue could be, or how to debug the problem further? This patch series [1] should help. You might want to try it. [1] http://lists.freedesktop.org/archives/mesa-dev/2013-March/036214.html Doing some debugging, it seems the call to ctx-Driver.ReadPixels() in _mesa_ReadnPixelsARB leads to _mesa_readpixels() being called in readpix.c. I'm attaching some output of gdb that will hopefully be useful. I'm also attaching the debug terminal output of running Weston with the DRM backend. Let me know if I can provide other useful information. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Very low framerate when recording desktop content in Weston using mesa git on Radeon 5770 (glReadPixels slow path)
Slowness is not usually a bug. I guess it can be optimized even more. It depends on where the bottleneck is now. Marek On Sun, Mar 17, 2013 at 10:14 PM, Rune Kjær Svendsen runesv...@gmail.com wrote: Thank you very much! This is much better. It's gone from 0.5-ish FPS when zooming in to around 10 FPS, depending on screen content. So I figure this isn't a bug? I assumed it was a bug, but is the case simply that an efficient glReadPixels path for radeon/gallium doesn't exist? The patch set sure helps in that regard, although it'd be really nice to get 30 FPS consistently, if at all possible. Thanks again. /Rune On Sun, Mar 17, 2013 at 6:46 PM, Andreas Boll andreas.boll@gmail.com wrote: 2013/3/17 Rune Kjær Svendsen runesv...@gmail.com: Hello list I'm having problems recording the desktop content using the Weston compositor's built-in recording function. When I start a recording and do something that changes a lot of screen content (like zooming in on the desktop, for example), I get around 0.5 FPS. Using sysprof, I can see that ~98% of CPU is used in the function unpack_XRGB(). krh has told me this is caused by glReadPixels going through a slowpath. I have a Radeon HD 5770 GPU and I'm using mesa git (I've tried the mesa version in the Ubuntu 12.10 repos, and the xorg-edgers PPA, same result). Does anyone know what the issue could be, or how to debug the problem further? This patch series [1] should help. You might want to try it. [1] http://lists.freedesktop.org/archives/mesa-dev/2013-March/036214.html Doing some debugging, it seems the call to ctx-Driver.ReadPixels() in _mesa_ReadnPixelsARB leads to _mesa_readpixels() being called in readpix.c. I'm attaching some output of gdb that will hopefully be useful. I'm also attaching the debug terminal output of running Weston with the DRM backend. Let me know if I can provide other useful information. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev