Re: Canvas performance on Mac OS
Is that on Sierpinski? The main question I still have is whether this helps other apps. The usage pattern in Sierpinski is fairly specific and may not translate to all apps (we're still planning on doing this as it is an easy improvement that doesn't seem to have a down-side, but I don't want to break out any champagne if it is just this one benchmark)... ...jim On 4/16/15 12:58 PM, Johan Vos wrote: Hi Jim, On iOS, the performance jumped from 2 fps to 15 fps on my old iPad. Excellent work! - Johan 2015-04-14 21:16 GMT+02:00 Jim Graham james.gra...@oracle.com: Hi Chris, We identified a fairly localized optimization that we might be able to apply to enhance the performance of your Sierpinski program. We don't have any figures yet on whether this will improve other applications/benchmarks that people have been discussing, but the improvements with your Sierpinski program are quite dramatic on a number of platforms and GPUs. This issue is now being tracked as: https://javafx-jira.kenai.com/ browse/RT-40533 If others could apply the indicated patch to an OpenJFX build and provide feedback on any improvements (or bugs!) that they see, that would help. In the meantime, we have a lot of testing to do to verify the correctness of the changes... ...jim On 4/8/15 9:25 AM, Chris Newland wrote: Hi Jim, I'll post the verbose prism output from my iMac when I get home. Just tried this on my Linux workstation and the performance gap is the same between es2 and sw so I don't think it's an OSX issue. uname -a Linux chris 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux $JAVA_HOME/bin/java -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 20 fps: 31 fps: 32 fps: 33 fps: 35 fps: 34 fps: 33 $JAVA_HOME/bin/java -Dprism.order=sw -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 56 fps: 60 fps: 59 fps: 60 fps: 61 fps: 61 fps: 60 This is a Xeon W3520 quad-core HT box with an Nvidia Quadro FX 580 graphics card running driver 304.125 Regards, Chris On Wed, April 8, 2015 00:16, Jim Graham wrote: OK, I took the time to put my rMBP on a diet yesterday and find room to install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, so my theory that something changed in the OGL implementation for 10.10 doesn't hold water. But, I then tried it using the integrated graphics. I get really poor performance using the integrated Intel 4000 graphics, but I get great numbers on the discrete nVidia 650m. It makes sense that the Intel graphics wouldn't be as powerful as the discrete graphics, but we shouldn't be taxing it that much to make that big of a difference. Just to be sure - is that iMac a dual graphics system, or is it all-AMD-all-the-time? You can see which GPU is being used if you run it with -Dprism.verbose=true... ...jim On 4/2/15 4:13 PM, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/ com/ch
Re: Canvas performance on Mac OS
Hi Jim, Thanks for looking into this. The patch definitely improves es2 performance on Debian Linux amd64 from around 33fps to around 53fps for me (nVidia FX580). I've made patched overlay builds of OpenJFX (Linux) 8 and 9 available on my OpenJFX CI server for anyone who wants to try it: http://108.61.191.178/ Will test on OSX tonight. Cheers, Chris On Tue, April 14, 2015 20:16, Jim Graham wrote: Hi Chris, We identified a fairly localized optimization that we might be able to apply to enhance the performance of your Sierpinski program. We don't have any figures yet on whether this will improve other applications/benchmarks that people have been discussing, but the improvements with your Sierpinski program are quite dramatic on a number of platforms and GPUs. This issue is now being tracked as: https://javafx-jira.kenai.com/browse/RT-40533 If others could apply the indicated patch to an OpenJFX build and provide feedback on any improvements (or bugs!) that they see, that would help. In the meantime, we have a lot of testing to do to verify the correctness of the changes... ...jim On 4/8/15 9:25 AM, Chris Newland wrote: Hi Jim, I'll post the verbose prism output from my iMac when I get home. Just tried this on my Linux workstation and the performance gap is the same between es2 and sw so I don't think it's an OSX issue. uname -a Linux chris 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux $JAVA_HOME/bin/java -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 20 fps: 31 fps: 32 fps: 33 fps: 35 fps: 34 fps: 33 $JAVA_HOME/bin/java -Dprism.order=sw -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 56 fps: 60 fps: 59 fps: 60 fps: 61 fps: 61 fps: 60 This is a Xeon W3520 quad-core HT box with an Nvidia Quadro FX 580 graphics card running driver 304.125 Regards, Chris On Wed, April 8, 2015 00:16, Jim Graham wrote: OK, I took the time to put my rMBP on a diet yesterday and find room to install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, so my theory that something changed in the OGL implementation for 10.10 doesn't hold water. But, I then tried it using the integrated graphics. I get really poor performance using the integrated Intel 4000 graphics, but I get great numbers on the discrete nVidia 650m. It makes sense that the Intel graphics wouldn't be as powerful as the discrete graphics, but we shouldn't be taxing it that much to make that big of a difference. Just to be sure - is that iMac a dual graphics system, or is it all-AMD-all-the-time? You can see which GPU is being used if you run it with -Dprism.verbose=true... ...jim On 4/2/15 4:13 PM, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/j ava/ com/ch risnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an
Re: Canvas performance on Mac OS
Confirmed, full 60fps performance on 2011 iMac with this fix: /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/bin/java -cp target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 19 fps: 26 fps: 21 fps: 21 fps: 26 fps: 17 /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk_PATCHED/Contents/Home/bin/java -cp target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 53 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 I've uploaded OSX SDK overlay builds containing this webrev to http://108.61.191.178/ if anyone wants to test the fix on their OSX system. Thanks a lot Jim and team for looking into this! Cheers, Chris On Thu, April 16, 2015 09:39, Chris Newland wrote: Hi Jim, Thanks for looking into this. The patch definitely improves es2 performance on Debian Linux amd64 from around 33fps to around 53fps for me (nVidia FX580). I've made patched overlay builds of OpenJFX (Linux) 8 and 9 available on my OpenJFX CI server for anyone who wants to try it: http://108.61.191.178/ Will test on OSX tonight. Cheers, Chris On Tue, April 14, 2015 20:16, Jim Graham wrote: Hi Chris, We identified a fairly localized optimization that we might be able to apply to enhance the performance of your Sierpinski program. We don't have any figures yet on whether this will improve other applications/benchmarks that people have been discussing, but the improvements with your Sierpinski program are quite dramatic on a number of platforms and GPUs. This issue is now being tracked as: https://javafx-jira.kenai.com/browse/RT-40533 If others could apply the indicated patch to an OpenJFX build and provide feedback on any improvements (or bugs!) that they see, that would help. In the meantime, we have a lot of testing to do to verify the correctness of the changes... ...jim On 4/8/15 9:25 AM, Chris Newland wrote: Hi Jim, I'll post the verbose prism output from my iMac when I get home. Just tried this on my Linux workstation and the performance gap is the same between es2 and sw so I don't think it's an OSX issue. uname -a Linux chris 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux $JAVA_HOME/bin/java -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 20 fps: 31 fps: 32 fps: 33 fps: 35 fps: 34 fps: 33 $JAVA_HOME/bin/java -Dprism.order=sw -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 56 fps: 60 fps: 59 fps: 60 fps: 61 fps: 61 fps: 60 This is a Xeon W3520 quad-core HT box with an Nvidia Quadro FX 580 graphics card running driver 304.125 Regards, Chris On Wed, April 8, 2015 00:16, Jim Graham wrote: OK, I took the time to put my rMBP on a diet yesterday and find room to install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, so my theory that something changed in the OGL implementation for 10.10 doesn't hold water. But, I then tried it using the integrated graphics. I get really poor performance using the integrated Intel 4000 graphics, but I get great numbers on the discrete nVidia 650m. It makes sense that the Intel graphics wouldn't be as powerful as the discrete graphics, but we shouldn't be taxing it that much to make that big of a difference. Just to be sure - is that iMac a dual graphics system, or is it all-AMD-all-the-time? You can see which GPU is being used if you run it with -Dprism.verbose=true... ...jim On 4/2/15 4:13 PM, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone
Re: Canvas performance on Mac OS
Hi Chris, We identified a fairly localized optimization that we might be able to apply to enhance the performance of your Sierpinski program. We don't have any figures yet on whether this will improve other applications/benchmarks that people have been discussing, but the improvements with your Sierpinski program are quite dramatic on a number of platforms and GPUs. This issue is now being tracked as: https://javafx-jira.kenai.com/browse/RT-40533 If others could apply the indicated patch to an OpenJFX build and provide feedback on any improvements (or bugs!) that they see, that would help. In the meantime, we have a lot of testing to do to verify the correctness of the changes... ...jim On 4/8/15 9:25 AM, Chris Newland wrote: Hi Jim, I'll post the verbose prism output from my iMac when I get home. Just tried this on my Linux workstation and the performance gap is the same between es2 and sw so I don't think it's an OSX issue. uname -a Linux chris 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux $JAVA_HOME/bin/java -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 20 fps: 31 fps: 32 fps: 33 fps: 35 fps: 34 fps: 33 $JAVA_HOME/bin/java -Dprism.order=sw -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 56 fps: 60 fps: 59 fps: 60 fps: 61 fps: 61 fps: 60 This is a Xeon W3520 quad-core HT box with an Nvidia Quadro FX 580 graphics card running driver 304.125 Regards, Chris On Wed, April 8, 2015 00:16, Jim Graham wrote: OK, I took the time to put my rMBP on a diet yesterday and find room to install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, so my theory that something changed in the OGL implementation for 10.10 doesn't hold water. But, I then tried it using the integrated graphics. I get really poor performance using the integrated Intel 4000 graphics, but I get great numbers on the discrete nVidia 650m. It makes sense that the Intel graphics wouldn't be as powerful as the discrete graphics, but we shouldn't be taxing it that much to make that big of a difference. Just to be sure - is that iMac a dual graphics system, or is it all-AMD-all-the-time? You can see which GPU is being used if you run it with -Dprism.verbose=true... ...jim On 4/2/15 4:13 PM, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/ com/ch risnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast.
Re: Canvas performance on Mac OS
This is important Thanks guys Sent from my iPhone On Apr 8, 2015, at 9:25 AM, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, I'll post the verbose prism output from my iMac when I get home. Just tried this on my Linux workstation and the performance gap is the same between es2 and sw so I don't think it's an OSX issue. uname -a Linux chris 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux $JAVA_HOME/bin/java -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 20 fps: 31 fps: 32 fps: 33 fps: 35 fps: 34 fps: 33 $JAVA_HOME/bin/java -Dprism.order=sw -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 56 fps: 60 fps: 59 fps: 60 fps: 61 fps: 61 fps: 60 This is a Xeon W3520 quad-core HT box with an Nvidia Quadro FX 580 graphics card running driver 304.125 Regards, Chris On Wed, April 8, 2015 00:16, Jim Graham wrote: OK, I took the time to put my rMBP on a diet yesterday and find room to install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, so my theory that something changed in the OGL implementation for 10.10 doesn't hold water. But, I then tried it using the integrated graphics. I get really poor performance using the integrated Intel 4000 graphics, but I get great numbers on the discrete nVidia 650m. It makes sense that the Intel graphics wouldn't be as powerful as the discrete graphics, but we shouldn't be taxing it that much to make that big of a difference. Just to be sure - is that iMac a dual graphics system, or is it all-AMD-all-the-time? You can see which GPU is being used if you run it with -Dprism.verbose=true... ...jim On 4/2/15 4:13 PM, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/ com/ch risnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software
Re: Canvas performance on Mac OS
Hi Jim, Definitely discrete GPU on the iMac: java -cp target/DemoFX.jar -Dprism.verbose=true com.chrisnewland.demofx.standalone.Sierpinski Prism pipeline init order: es2 sw Using native-based Pisces rasterizer Using dirty region optimizations Not using texture mask for primitives Not forcing power of 2 sizes for textures Using hardware CLAMP_TO_ZERO mode Opting in for HiDPI pixel scaling Prism pipeline name = com.sun.prism.es2.ES2Pipeline Loading ES2 native library ... prism_es2 succeeded. GLFactory using com.sun.prism.es2.MacGLFactory (X) Got class = class com.sun.prism.es2.ES2Pipeline Initialized prism pipeline: com.sun.prism.es2.ES2Pipeline Maximum supported texture size: 16384 Maximum texture size clamped to 4096 Non power of two texture support = true Maximum number of vertex attributes = 16 Maximum number of uniform vertex components = 3072 Maximum number of uniform fragment components = 3072 Maximum number of varying components = 128 Maximum number of texture units usable in a vertex shader = 16 Maximum number of texture units usable in a fragment shader = 16 Graphics Vendor: ATI Technologies Inc. Renderer: AMD Radeon HD 6970M OpenGL Engine Version: 2.1 ATI-1.24.38 vsync: true vpipe: true fps: 1 ES2ResourceFactory: Prism - createStockShader: Solid_Color.frag ES2ResourceFactory: Prism - createStockShader: FillPgram_Color.frag Loading Prism common native library ... succeeded. ES2ResourceFactory: Prism - createStockShader: Texture_Color.frag ES2ResourceFactory: Prism - createStockShader: Solid_TextureRGB.frag fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 fps: 18 With software pipeline: java -cp target/DemoFX.jar -Dprism.verbose=true -Dprism.order=sw com.chrisnewland.demofx.standalone.Sierpinski Prism pipeline init order: sw Using native-based Pisces rasterizer Using dirty region optimizations Not using texture mask for primitives Not forcing power of 2 sizes for textures Using hardware CLAMP_TO_ZERO mode Opting in for HiDPI pixel scaling *** Fallback to Prism SW pipeline Prism pipeline name = com.sun.prism.sw.SWPipeline (X) Got class = class com.sun.prism.sw.SWPipeline Initialized prism pipeline: com.sun.prism.sw.SWPipeline vsync: true vpipe: false fps: 1 Loading Prism common native library ... succeeded. fps: 53 fps: 60 fps: 60 fps: 60 fps: 60 But earlier I got similar performance drop for es2 on a Linux system with discrete Nvidia graphics (see my previous email). I'll see if I can find a Windows box with discrete graphics to test if all platforms exhibit this behaviour. Cheers, Chris On Wed, April 8, 2015 00:16, Jim Graham wrote: OK, I took the time to put my rMBP on a diet yesterday and find room to install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, so my theory that something changed in the OGL implementation for 10.10 doesn't hold water. But, I then tried it using the integrated graphics. I get really poor performance using the integrated Intel 4000 graphics, but I get great numbers on the discrete nVidia 650m. It makes sense that the Intel graphics wouldn't be as powerful as the discrete graphics, but we shouldn't be taxing it that much to make that big of a difference. Just to be sure - is that iMac a dual graphics system, or is it all-AMD-all-the-time? You can see which GPU is being used if you run it with -Dprism.verbose=true... ...jim On 4/2/15 4:13 PM, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was
Re: Canvas performance on Mac OS
All my MBP numbers are on integrated Intel graphics as well. I tested on an old MBP that only has Intel graphics and on my more recent MBP the Nvidia is deactivated due to a problem with it. On Wed, Apr 8, 2015 at 1:16 AM, Jim Graham james.gra...@oracle.com wrote: OK, I took the time to put my rMBP on a diet yesterday and find room to install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, so my theory that something changed in the OGL implementation for 10.10 doesn't hold water. But, I then tried it using the integrated graphics. I get really poor performance using the integrated Intel 4000 graphics, but I get great numbers on the discrete nVidia 650m. It makes sense that the Intel graphics wouldn't be as powerful as the discrete graphics, but we shouldn't be taxing it that much to make that big of a difference. Just to be sure - is that iMac a dual graphics system, or is it all-AMD-all-the-time? You can see which GPU is being used if you run it with -Dprism.verbose=true... ...jim On 4/2/15 4:13 PM, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/ master/src/main/java/com/ch risnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software currently for all pipelines. For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path rasterization. For the SW pipeline, drawLine is a simplified case of drawPolygon and so the overhead of lots of calls to drawLine() dominates its performance. I would expect ES2 to blow the SW pipeline out of the water with drawLine() performance (as long as there are no additional rendering primitives interspersed in the set of lines). But, both should be on the same footing for the drawPolygon case. Does the ES2 pipeline compare similarly (hopefully better than) the SW pipeline for the polygon case? One thing I noticed is that we have no optimized case for drawLine() on the SW pipeline. It generates a path containing a single
Re: Canvas performance on Mac OS
Hi Jim, I'll post the verbose prism output from my iMac when I get home. Just tried this on my Linux workstation and the performance gap is the same between es2 and sw so I don't think it's an OSX issue. uname -a Linux chris 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux $JAVA_HOME/bin/java -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 20 fps: 31 fps: 32 fps: 33 fps: 35 fps: 34 fps: 33 $JAVA_HOME/bin/java -Dprism.order=sw -classpath target/DemoFX.jar com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 56 fps: 60 fps: 59 fps: 60 fps: 61 fps: 61 fps: 60 This is a Xeon W3520 quad-core HT box with an Nvidia Quadro FX 580 graphics card running driver 304.125 Regards, Chris On Wed, April 8, 2015 00:16, Jim Graham wrote: OK, I took the time to put my rMBP on a diet yesterday and find room to install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, so my theory that something changed in the OGL implementation for 10.10 doesn't hold water. But, I then tried it using the integrated graphics. I get really poor performance using the integrated Intel 4000 graphics, but I get great numbers on the discrete nVidia 650m. It makes sense that the Intel graphics wouldn't be as powerful as the discrete graphics, but we shouldn't be taxing it that much to make that big of a difference. Just to be sure - is that iMac a dual graphics system, or is it all-AMD-all-the-time? You can see which GPU is being used if you run it with -Dprism.verbose=true... ...jim On 4/2/15 4:13 PM, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/ com/ch risnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software currently for all pipelines. For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path rasterization. For the SW pipeline, drawLine is a simplified case of drawPolygon and so the overhead of lots of calls to drawLine()
Re: Canvas performance on Mac OS
OK, I took the time to put my rMBP on a diet yesterday and find room to install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, so my theory that something changed in the OGL implementation for 10.10 doesn't hold water. But, I then tried it using the integrated graphics. I get really poor performance using the integrated Intel 4000 graphics, but I get great numbers on the discrete nVidia 650m. It makes sense that the Intel graphics wouldn't be as powerful as the discrete graphics, but we shouldn't be taxing it that much to make that big of a difference. Just to be sure - is that iMac a dual graphics system, or is it all-AMD-all-the-time? You can see which GPU is being used if you run it with -Dprism.verbose=true... ...jim On 4/2/15 4:13 PM, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/com/ch risnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software currently for all pipelines. For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path rasterization. For the SW pipeline, drawLine is a simplified case of drawPolygon and so the overhead of lots of calls to drawLine() dominates its performance. I would expect ES2 to blow the SW pipeline out of the water with drawLine() performance (as long as there are no additional rendering primitives interspersed in the set of lines). But, both should be on the same footing for the drawPolygon case. Does the ES2 pipeline compare similarly (hopefully better than) the SW pipeline for the polygon case? One thing I noticed is that we have no optimized case for drawLine() on the SW pipeline. It generates a path containing a single MOVETO and LINETO and feeds it to the generalized path rasterizer when it could instead compute the rounded/square rectangle and render it more directly. If we added that support then I'd expect the SW pipeline to perform the set of drawLine calls faster than drawPolygon as well... ...jim On 3/28/15 3:22 AM, Chris Newland wrote: Hi Robert, I've not filed a
Re: Canvas performance on Mac OS
If I modify the Sierpinksi program to use moveTo/lineTo/lineTo on a path and fill the entire path at once the performance improves dramatically on both Intel and nVidia GPUs. It is faster still if I replace the triangles with fillRect calls, but not by as large a margin. It would appear that we are getting entirely bogged down by uploading lots of little alpha coverage tiles for each individual polygon to the GPU (odd that this overhead would be greater for the Intel integrated graphics that uses main system RAM than the nVidia discrete graphics which uses a separate memory system, but there could be something to be said for the discrete VRAM being faster). ...jim On 3/31/15, 1:31 PM, Chris Newland wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/com/chrisnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software currently for all pipelines. For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path rasterization. For the SW pipeline, drawLine is a simplified case of drawPolygon and so the overhead of lots of calls to drawLine() dominates its performance. I would expect ES2 to blow the SW pipeline out of the water with drawLine() performance (as long as there are no additional rendering primitives interspersed in the set of lines). But, both should be on the same footing for the drawPolygon case. Does the ES2 pipeline compare similarly (hopefully better than) the SW pipeline for the polygon case? One thing I noticed is that we have no optimized case for drawLine() on the SW pipeline. It generates a path containing a single MOVETO and LINETO and feeds it to the generalized path rasterizer when it could instead compute the rounded/square rectangle and render it more directly. If we added that support then I'd expect the SW pipeline to perform the set of drawLine calls faster than drawPolygon as well... ...jim On 3/28/15 3:22 AM, Chris Newland wrote: Hi Robert, I've not filed a Jira yet as I was hoping to find time to investigate thoroughly but when I saw your question I thought I'd better add my findings. I believe the issue is in the ES2Pipeline as if I run with -Dprism.order=sw then strokePolygon outperforms the series of strokeLine commands as expected: java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m line Result: 44fps java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly Result: 60fps Will see if I can find the root cause as I've got plenty more examples where ES2Pipeline performs horribly on my Mac which should have no problem throwing around a few thousand polys. I realise there's a *lot* of indirection involved in making JavaFX support such a wide range of underlying graphics systems but I do think there's a bug here. Will file a Jira if I can contribute a bit more than feels slow ;) Cheers, Chris On Sat, March 28, 2015 10:06, Robert Krüger wrote: This is consistent with what I am observing. Is this something that Oracle is aware of? Looking at Jira, I don't see that anyone is working on this: https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Open%2C%20%2 2In% 20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%20%20AND%2 0la bels%20in%20(performance) Given that one of the One of the main reasons to use JFX for me is to be able to develop with one code base for at least OSX and Windows and the official statement what JavaFX is
Re: Canvas performance on Mac OS
Hi, On Sat, Apr 4, 2015 at 10:31 PM, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, -snip I think my question is: Does the OpenJFX group think JavaFX is a suitable technology for full frame rate canvas-style graphics or is the degree of indirection between application code and the graphics hardware just too great? I think there is also a general problem not related to 2d drawing at least on 10.10.2. For RT-40377 I created a simple node-based alternative which is animating _one_ circle and in full-screen mode I get 25-35 fps on my retina MBP. Maybe it's unrelated but maybe there is an additional throttle somewhere also affecting your case. I would have expected the hardware I've tested on to eat 2500 triangles at 60fps for breakfast even with no GPU acceleration. Yes, for my case with one circle I would have expected almost no CPU but I still get 15% which I find quite a bit for rendering one circle 30 times/sec. I'm going to knock up a version of this code that uses Graphics2D for comparison. If you do that, please also include numbers for running that code with Apple Java 6 as well, because there are quite a few people still saying that Apple's Java 6 outperforms Oracle's Java by a lot in 2D Graphics. Cheers, Chris I don't know what else to do but to lobby here and invest some work in Jira issues with reproducible test cases. There is a huge performance problem on the Mac (I have to admit, I have no Windows machine to compare myself) with the potential to drive companies like ours, which is seriously considering/testing the technology for our product development, away from the technology. I would also hope that other people who have encountered this like the Ultramixer guys don't give up on this and keep posting qualified information, making the case for this and supporting the Oracle team by reproducible benchmarks/test cases. Cheers, Robert
Re: Canvas performance on Mac OS
Hi Jim, The first numbers were for my 27 2011 iMac which runs OSX 10.9 Mavericks. Here are my numbers for a 2013 MacBook Pro (13 Retina) 2.4 GHz Intel Core i5 / 8GB / Intel Iris 1536 MB / OSX 10.10.2 Yosemite I don't get 60fps with either pipeline: java -Dprism.order=es2 -cp target/classes com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 22 fps: 30 fps: 30 fps: 32 java -Dprism.order=sw -cp target/classes com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 28 fps: 34 fps: 33 fps: 34 The OSX Activity Monitor shows the CPU for the Java process near 100% so it's CPU bound for both pipelines. On my iMac where I get 60fps with sw pipeline the CPU is only 50%. I've written a bunch of other JavaFX effects and it's only the routines that use on strokePolygon and fillPolygon that don't get 60fps once the polygon count goes above a few hundred. I've checked the JIT compilation in my application code with JITWatch and everything is compiled and inlined as I'd expect. GC logs show a GC every couple of seconds freeing up about 30MB: 13.505: [GC (Allocation Failure) [PSYoungGen: 31983K-96K(36352K)] 37760K-5889K(123904K), 0.0013589 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] fps: 32 fps: 32 15.089: [GC (Allocation Failure) [PSYoungGen: 31328K-160K(36352K)] 37121K-5969K(123904K), 0.0008222 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] fps: 33 16.683: [GC (Allocation Failure) [PSYoungGen: 30880K-194K(35840K)] 36689K-6011K(123392K), 0.0005803 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] I think my question is: Does the OpenJFX group think JavaFX is a suitable technology for full frame rate canvas-style graphics or is the degree of indirection between application code and the graphics hardware just too great? I would have expected the hardware I've tested on to eat 2500 triangles at 60fps for breakfast even with no GPU acceleration. I'm going to knock up a version of this code that uses Graphics2D for comparison. Cheers, Chris On Fri, April 3, 2015 00:13, Jim Graham wrote: On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/co m/ch risnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that
Re: Canvas performance on Mac OS
On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you running a newer version of MacOS? ...jim On 3/31/15 3:40 PM, Chris Newland wrote: Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/com/ch risnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software currently for all pipelines. For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path rasterization. For the SW pipeline, drawLine is a simplified case of drawPolygon and so the overhead of lots of calls to drawLine() dominates its performance. I would expect ES2 to blow the SW pipeline out of the water with drawLine() performance (as long as there are no additional rendering primitives interspersed in the set of lines). But, both should be on the same footing for the drawPolygon case. Does the ES2 pipeline compare similarly (hopefully better than) the SW pipeline for the polygon case? One thing I noticed is that we have no optimized case for drawLine() on the SW pipeline. It generates a path containing a single MOVETO and LINETO and feeds it to the generalized path rasterizer when it could instead compute the rounded/square rectangle and render it more directly. If we added that support then I'd expect the SW pipeline to perform the set of drawLine calls faster than drawPolygon as well... ...jim On 3/28/15 3:22 AM, Chris Newland wrote: Hi Robert, I've not filed a Jira yet as I was hoping to find time to investigate thoroughly but when I saw your question I thought I'd better add my findings. I believe the issue is in the ES2Pipeline as if I run with -Dprism.order=sw then strokePolygon outperforms the series of strokeLine commands as expected: java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m line Result: 44fps java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly Result: 60fps Will see if I can find the root cause as I've got plenty more examples where ES2Pipeline performs horribly on my Mac which should have no problem throwing around a few thousand polys. I realise there's a *lot* of indirection involved in making JavaFX support such a wide range of underlying
Re: Canvas performance on Mac OS
Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/com/chrisnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software currently for all pipelines. For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path rasterization. For the SW pipeline, drawLine is a simplified case of drawPolygon and so the overhead of lots of calls to drawLine() dominates its performance. I would expect ES2 to blow the SW pipeline out of the water with drawLine() performance (as long as there are no additional rendering primitives interspersed in the set of lines). But, both should be on the same footing for the drawPolygon case. Does the ES2 pipeline compare similarly (hopefully better than) the SW pipeline for the polygon case? One thing I noticed is that we have no optimized case for drawLine() on the SW pipeline. It generates a path containing a single MOVETO and LINETO and feeds it to the generalized path rasterizer when it could instead compute the rounded/square rectangle and render it more directly. If we added that support then I'd expect the SW pipeline to perform the set of drawLine calls faster than drawPolygon as well... ...jim On 3/28/15 3:22 AM, Chris Newland wrote: Hi Robert, I've not filed a Jira yet as I was hoping to find time to investigate thoroughly but when I saw your question I thought I'd better add my findings. I believe the issue is in the ES2Pipeline as if I run with -Dprism.order=sw then strokePolygon outperforms the series of strokeLine commands as expected: java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m line Result: 44fps java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly Result: 60fps Will see if I can find the root cause as I've got plenty more examples where ES2Pipeline performs horribly on my Mac which should have no problem throwing around a few thousand polys. I realise there's a *lot* of indirection involved in making JavaFX support such a wide range of underlying graphics systems but I do think there's a bug here. Will file a Jira if I can contribute a bit more than feels slow ;) Cheers, Chris On Sat, March 28, 2015 10:06, Robert Krüger wrote: This is consistent with what I am observing. Is this something that Oracle is aware of? Looking at Jira, I don't see that anyone is working on this: https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Open%2C%20%2 2In% 20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%20%20AND%2 0la bels%20in%20(performance) Given that one of the One of the main reasons to use JFX for me is to be able to develop with one code base for at least OSX and Windows and the official statement what JavaFX is for, i.e. JavaFX is a set of graphics and media packages that enables developers to design, create, test, debug, and deploy rich client applications that operate consistently across diverse platforms and the fact that this is clearly not the case currently (8u40) as soon as I do something else than simple forms, I run into performance/quality problems on the Mac, I am a bit unsure what to make of all that. Is Mac OSX a second-class citizen as far as dev
Re: Canvas performance on Mac OS
Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/com/chrisnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software currently for all pipelines. For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path rasterization. For the SW pipeline, drawLine is a simplified case of drawPolygon and so the overhead of lots of calls to drawLine() dominates its performance. I would expect ES2 to blow the SW pipeline out of the water with drawLine() performance (as long as there are no additional rendering primitives interspersed in the set of lines). But, both should be on the same footing for the drawPolygon case. Does the ES2 pipeline compare similarly (hopefully better than) the SW pipeline for the polygon case? One thing I noticed is that we have no optimized case for drawLine() on the SW pipeline. It generates a path containing a single MOVETO and LINETO and feeds it to the generalized path rasterizer when it could instead compute the rounded/square rectangle and render it more directly. If we added that support then I'd expect the SW pipeline to perform the set of drawLine calls faster than drawPolygon as well... ...jim On 3/28/15 3:22 AM, Chris Newland wrote: Hi Robert, I've not filed a Jira yet as I was hoping to find time to investigate thoroughly but when I saw your question I thought I'd better add my findings. I believe the issue is in the ES2Pipeline as if I run with -Dprism.order=sw then strokePolygon outperforms the series of strokeLine commands as expected: java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m line Result: 44fps java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly Result: 60fps Will see if I can find the root cause as I've got plenty more examples where ES2Pipeline performs horribly on my Mac which should have no problem throwing around a few thousand polys. I realise there's a *lot* of indirection involved in making JavaFX support such a wide range of underlying graphics systems but I do think there's a bug here. Will file a Jira if I can contribute a bit more than feels slow ;) Cheers, Chris On Sat, March 28, 2015 10:06, Robert Krüger wrote: This is consistent with what I am observing. Is this something that Oracle is aware of? Looking at Jira, I don't see that anyone is working on this: https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Open%2C%20%2 2In% 20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%20%20AND%2 0la bels%20in%20(performance) Given that one of the One of the main reasons to use JFX for me is to be able to develop with one code base for at least OSX and Windows and the official statement what JavaFX is for, i.e. JavaFX is a set of graphics and media packages that enables developers to design, create, test, debug, and deploy rich client applications that operate consistently across diverse platforms and the fact that this is clearly not the case currently (8u40) as soon as I do something else than simple forms, I run into performance/quality problems on the Mac, I am a bit unsure what to make of all that. Is Mac OSX a second-class citizen as far as dev resources are concerned? Tobi and Chris, have you filed Jira Issues on Mac graphics performance that can be tracked? I will file an issue with a simple test case and hope for the best. On
Re: Canvas performance on Mac OS
Hi Hervé, That's a valid question :) Probably because a) All my non-UI graphics experience is with immediate-mode / raster systems b) I'm interested in using JavaFX for particle effects / demoscene / gaming so assumed (perhaps wrongly?) that scenegraph was not the way to go for that due to the very large number of nodes. Numbers for my Sierpinski filled triangle example: System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB java -Dprism.order=es2 -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 java -Dprism.order=sw -cp target/classes/ com.chrisnewland.demofx.standalone.Sierpinski fps: 1 fps: 54 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 fps: 60 There are never more than 2500 filled triangles on screen. JDK is 1.8.0_40 I would say there is a performance problem here? (or at least a need for documentation so as to set expectations for gc.fillPolygon). Best regards, Chris On Tue, March 31, 2015 22:00, Hervé Girod wrote: Why don't you use Nodes rather than Canvas ? Sent from my iPhone On Mar 31, 2015, at 22:31, Chris Newland cnewl...@chrisnewland.com wrote: Hi Jim, Thanks, that makes things much clearer. I was surprised how much was going on under the hood of GraphicsContext and hoped it was just magic glue that gave the best of GPU acceleration where available and immediate-mode-like simple rasterizing where not. I've managed to find an anomaly with GraphicsContext.fillPolygon where the software pipeline achieves the full 60fps but ES2 can only manage 30-35fps. It uses lots of overlapping filled triangles so I expect suffers from the problem you've described. SSCCE: https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/com/ch risnewland/demofx/standalone/Sierpinski.java Was full frame rate canvas drawing an expected use case for JavaFX or would I be better off with Graphics2D? Thanks, Chris On Mon, March 30, 2015 20:04, Jim Graham wrote: Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software currently for all pipelines. For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path rasterization. For the SW pipeline, drawLine is a simplified case of drawPolygon and so the overhead of lots of calls to drawLine() dominates its performance. I would expect ES2 to blow the SW pipeline out of the water with drawLine() performance (as long as there are no additional rendering primitives interspersed in the set of lines). But, both should be on the same footing for the drawPolygon case. Does the ES2 pipeline compare similarly (hopefully better than) the SW pipeline for the polygon case? One thing I noticed is that we have no optimized case for drawLine() on the SW pipeline. It generates a path containing a single MOVETO and LINETO and feeds it to the generalized path rasterizer when it could instead compute the rounded/square rectangle and render it more directly. If we added that support then I'd expect the SW pipeline to perform the set of drawLine calls faster than drawPolygon as well... ...jim On 3/28/15 3:22 AM, Chris Newland wrote: Hi Robert, I've not filed a Jira yet as I was hoping to find time to investigate thoroughly but when I saw your question I thought I'd better add my findings. I believe the issue is in the ES2Pipeline as if I run with -Dprism.order=sw then strokePolygon outperforms the series of strokeLine commands as expected: java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m line Result: 44fps java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly Result: 60fps Will see if I can find the root cause as I've got plenty more examples where ES2Pipeline performs horribly on my Mac which should have no problem throwing around a few thousand polys. I realise there's a *lot* of indirection involved in making JavaFX support such a wide range of underlying graphics systems but I do think there's a bug here. Will file a Jira if I can contribute a bit more than feels slow ;)
Re: Canvas performance on Mac OS
On 3/30/15 12:04 PM, Jim Graham wrote: drawPolygon() is a very complex operation that involves things like: - dealing with only rendering common points of intersection once An example of the distinction here - try a test case where you execute the exact same diagonal line primitive 1,000 times on top of itself (identical coordinates for all of them). Then change the example to use a Polygon that goes from point A to point B and back, over itself 1,000 times. The result of all of those lines will have jagged edges even though the lines themselves are antialiased because the partially filled pixels along the edges slowly accumulate opacity until their carefully blended edges get lost in the accumulated error. The result of the polygon will be identical to just drawing an antialiased line from point A to point B because it is turned into a single coverage result by the software rasterizer. Another similar example - set an opacity of 0.1 on all of those rendering calls. The (multi-)drawLine example will look like an opaque line of 1.0 opacity, but the polygon will still look like it has an opacity of 0.1 because the coverages are accumulated across the entire polygon before any rendering occurs and so each pixel is only blended once... ...jim
Re: Canvas performance on Mac OS
Hi Chris, drawLine() is a very simple primitive that can be optimized with a GPU shader. It either looks like a (potentially rotated) rectangle or a rounded rect - and we have optimized shaders for both cases. A large number of drawLine() calls turns into simply accumulating a large vertex list and uploading it to the GPU with an appropriate shader which is very fast. drawPolygon() is a very complex operation that involves things like: - dealing with line joins between segments that don't exist for drawLine() - dealing with only rendering common points of intersection once To handle all of that complexity we have to involve a rasterizer that takes the entire collection of lines, analyzes the stroke attributes and interactions and computes a coverage mask for each pixel in the region. We do that in software currently for all pipelines. For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path rasterization. For the SW pipeline, drawLine is a simplified case of drawPolygon and so the overhead of lots of calls to drawLine() dominates its performance. I would expect ES2 to blow the SW pipeline out of the water with drawLine() performance (as long as there are no additional rendering primitives interspersed in the set of lines). But, both should be on the same footing for the drawPolygon case. Does the ES2 pipeline compare similarly (hopefully better than) the SW pipeline for the polygon case? One thing I noticed is that we have no optimized case for drawLine() on the SW pipeline. It generates a path containing a single MOVETO and LINETO and feeds it to the generalized path rasterizer when it could instead compute the rounded/square rectangle and render it more directly. If we added that support then I'd expect the SW pipeline to perform the set of drawLine calls faster than drawPolygon as well... ...jim On 3/28/15 3:22 AM, Chris Newland wrote: Hi Robert, I've not filed a Jira yet as I was hoping to find time to investigate thoroughly but when I saw your question I thought I'd better add my findings. I believe the issue is in the ES2Pipeline as if I run with -Dprism.order=sw then strokePolygon outperforms the series of strokeLine commands as expected: java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m line Result: 44fps java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly Result: 60fps Will see if I can find the root cause as I've got plenty more examples where ES2Pipeline performs horribly on my Mac which should have no problem throwing around a few thousand polys. I realise there's a *lot* of indirection involved in making JavaFX support such a wide range of underlying graphics systems but I do think there's a bug here. Will file a Jira if I can contribute a bit more than feels slow ;) Cheers, Chris On Sat, March 28, 2015 10:06, Robert Krüger wrote: This is consistent with what I am observing. Is this something that Oracle is aware of? Looking at Jira, I don't see that anyone is working on this: https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Open%2C%20%22In% 20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%20%20AND%20la bels%20in%20(performance) Given that one of the One of the main reasons to use JFX for me is to be able to develop with one code base for at least OSX and Windows and the official statement what JavaFX is for, i.e. JavaFX is a set of graphics and media packages that enables developers to design, create, test, debug, and deploy rich client applications that operate consistently across diverse platforms and the fact that this is clearly not the case currently (8u40) as soon as I do something else than simple forms, I run into performance/quality problems on the Mac, I am a bit unsure what to make of all that. Is Mac OSX a second-class citizen as far as dev resources are concerned? Tobi and Chris, have you filed Jira Issues on Mac graphics performance that can be tracked? I will file an issue with a simple test case and hope for the best. On Fri, Mar 27, 2015 at 11:08 PM, Chris Newland cnewl...@chrisnewland.com wrote: Possibly related: I can reproduce a massive (90%) performance drop on OSX between drawing a wireframe polygon on a Canvas using a series of gc.strokeLine(double x1, double y1, double x2, double y2) commands versus using a single gc.strokePolygon(double[] xPoints, double[] yPoints, int count) command. Creating the polygons manually with strokeLine() is significantly faster using the ES2Pipeline on OSX. This is reproducible in a little GitHub JavaFX benchmarking project I've created: https://github.com/chriswhocodes/DemoFX Build with ant Run with: # use strokeLine ./run.sh -c 5000 -m line result: 60 (sixty) fps # use strokePolygon ./run.sh -c 5000 -m poly result: 6 (six) fps System is 2011 iMac 27 / Mavericks / 3.4GHz Core i7 / 20GB RAM / Radeon 6970M 1024MB
Re: Canvas performance on Mac OS
I have filed this now: https://javafx-jira.kenai.com/browse/RT-40377 On Sat, Mar 28, 2015 at 11:06 AM, Robert Krüger krue...@lesspain.de wrote: This is consistent with what I am observing. Is this something that Oracle is aware of? Looking at Jira, I don't see that anyone is working on this: https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%20%20AND%20labels%20in%20(performance) Given that one of the One of the main reasons to use JFX for me is to be able to develop with one code base for at least OSX and Windows and the official statement what JavaFX is for, i.e. JavaFX is a set of graphics and media packages that enables developers to design, create, test, debug, and deploy rich client applications that operate consistently across diverse platforms and the fact that this is clearly not the case currently (8u40) as soon as I do something else than simple forms, I run into performance/quality problems on the Mac, I am a bit unsure what to make of all that. Is Mac OSX a second-class citizen as far as dev resources are concerned? Tobi and Chris, have you filed Jira Issues on Mac graphics performance that can be tracked? I will file an issue with a simple test case and hope for the best. On Fri, Mar 27, 2015 at 11:08 PM, Chris Newland cnewl...@chrisnewland.com wrote: Possibly related: I can reproduce a massive (90%) performance drop on OSX between drawing a wireframe polygon on a Canvas using a series of gc.strokeLine(double x1, double y1, double x2, double y2) commands versus using a single gc.strokePolygon(double[] xPoints, double[] yPoints, int count) command. Creating the polygons manually with strokeLine() is significantly faster using the ES2Pipeline on OSX. This is reproducible in a little GitHub JavaFX benchmarking project I've created: https://github.com/chriswhocodes/DemoFX Build with ant Run with: # use strokeLine ./run.sh -c 5000 -m line result: 60 (sixty) fps # use strokePolygon ./run.sh -c 5000 -m poly result: 6 (six) fps System is 2011 iMac 27 / Mavericks / 3.4GHz Core i7 / 20GB RAM / Radeon 6970M 1024MB Looking at the code paths in javafx.scene.canvas.GraphicsContext: gc.strokeLine() maps to writeOp4(x1, y1, x2, y2, NGCanvas.STROKE_LINE) gc.strokePolygon() maps to writePoly(xPoints, yPoints, nPoints, true, NGCanvas.STROKE_PATH) which involves significantly more work with adding to and flushing a GrowableDataBuffer. I've not had time to dig any deeper than this but it's surely a bug when building a poly manually is 10x faster than using the convenience method. Cheers, Chris On Fri, March 27, 2015 21:26, Tobias Bley wrote: In my opinion the whole graphics performance on MacOSX isn’t good at all with JavaFX…. Am 27.03.2015 um 22:10 schrieb Robert Krüger krue...@lesspain.de: The bad full screen performance is without the arcs. It is just one call to fillRect, two to strokeOval and one to fillOval, that's all. I will build a simple test case and file an issue. On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham james.gra...@oracle.com wrote: Hi Robert, Please file a Jira issue with a simple test case. Arcs are handled as a generalized shape rather than via a predetermined shader, but it shouldn't be that slow. Something else may be going on. Another test might be to replace the arcs with rectangles or ellipses and see if the performance changes... ...jim On 3/27/15 1:52 PM, Robert Krüger wrote: Hi, I have a super-simple animation implemented using AnimationTimer and Canvas where the canvas just performs a few draw operations, i.e. fills the screen with a color and then draws and fills 2-3 circles and I have already observed that each drawing operation I add, results in significant CPU load (e.g. when I draw 10 arcs in addition to the circles, the CPU load goes up to 30-40% on a Mac Book Pro for a Canvas size of 600x600(!). Now I tested the animation in full screen mode (only with a few circles) and playback is unusable for a serious application (very choppy). Is 2D canvas performance known to be very bad on Mac or am I doing something wrong? Are there workarounds for this? Thanks, Robert -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com
Re: Canvas performance on Mac OS
This is consistent with what I am observing. Is this something that Oracle is aware of? Looking at Jira, I don't see that anyone is working on this: https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%20%20AND%20labels%20in%20(performance) Given that one of the One of the main reasons to use JFX for me is to be able to develop with one code base for at least OSX and Windows and the official statement what JavaFX is for, i.e. JavaFX is a set of graphics and media packages that enables developers to design, create, test, debug, and deploy rich client applications that operate consistently across diverse platforms and the fact that this is clearly not the case currently (8u40) as soon as I do something else than simple forms, I run into performance/quality problems on the Mac, I am a bit unsure what to make of all that. Is Mac OSX a second-class citizen as far as dev resources are concerned? Tobi and Chris, have you filed Jira Issues on Mac graphics performance that can be tracked? I will file an issue with a simple test case and hope for the best. On Fri, Mar 27, 2015 at 11:08 PM, Chris Newland cnewl...@chrisnewland.com wrote: Possibly related: I can reproduce a massive (90%) performance drop on OSX between drawing a wireframe polygon on a Canvas using a series of gc.strokeLine(double x1, double y1, double x2, double y2) commands versus using a single gc.strokePolygon(double[] xPoints, double[] yPoints, int count) command. Creating the polygons manually with strokeLine() is significantly faster using the ES2Pipeline on OSX. This is reproducible in a little GitHub JavaFX benchmarking project I've created: https://github.com/chriswhocodes/DemoFX Build with ant Run with: # use strokeLine ./run.sh -c 5000 -m line result: 60 (sixty) fps # use strokePolygon ./run.sh -c 5000 -m poly result: 6 (six) fps System is 2011 iMac 27 / Mavericks / 3.4GHz Core i7 / 20GB RAM / Radeon 6970M 1024MB Looking at the code paths in javafx.scene.canvas.GraphicsContext: gc.strokeLine() maps to writeOp4(x1, y1, x2, y2, NGCanvas.STROKE_LINE) gc.strokePolygon() maps to writePoly(xPoints, yPoints, nPoints, true, NGCanvas.STROKE_PATH) which involves significantly more work with adding to and flushing a GrowableDataBuffer. I've not had time to dig any deeper than this but it's surely a bug when building a poly manually is 10x faster than using the convenience method. Cheers, Chris On Fri, March 27, 2015 21:26, Tobias Bley wrote: In my opinion the whole graphics performance on MacOSX isn’t good at all with JavaFX…. Am 27.03.2015 um 22:10 schrieb Robert Krüger krue...@lesspain.de: The bad full screen performance is without the arcs. It is just one call to fillRect, two to strokeOval and one to fillOval, that's all. I will build a simple test case and file an issue. On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham james.gra...@oracle.com wrote: Hi Robert, Please file a Jira issue with a simple test case. Arcs are handled as a generalized shape rather than via a predetermined shader, but it shouldn't be that slow. Something else may be going on. Another test might be to replace the arcs with rectangles or ellipses and see if the performance changes... ...jim On 3/27/15 1:52 PM, Robert Krüger wrote: Hi, I have a super-simple animation implemented using AnimationTimer and Canvas where the canvas just performs a few draw operations, i.e. fills the screen with a color and then draws and fills 2-3 circles and I have already observed that each drawing operation I add, results in significant CPU load (e.g. when I draw 10 arcs in addition to the circles, the CPU load goes up to 30-40% on a Mac Book Pro for a Canvas size of 600x600(!). Now I tested the animation in full screen mode (only with a few circles) and playback is unusable for a serious application (very choppy). Is 2D canvas performance known to be very bad on Mac or am I doing something wrong? Are there workarounds for this? Thanks, Robert -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com
Re: Canvas performance on Mac OS
Hi Robert, I've not filed a Jira yet as I was hoping to find time to investigate thoroughly but when I saw your question I thought I'd better add my findings. I believe the issue is in the ES2Pipeline as if I run with -Dprism.order=sw then strokePolygon outperforms the series of strokeLine commands as expected: java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m line Result: 44fps java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly Result: 60fps Will see if I can find the root cause as I've got plenty more examples where ES2Pipeline performs horribly on my Mac which should have no problem throwing around a few thousand polys. I realise there's a *lot* of indirection involved in making JavaFX support such a wide range of underlying graphics systems but I do think there's a bug here. Will file a Jira if I can contribute a bit more than feels slow ;) Cheers, Chris On Sat, March 28, 2015 10:06, Robert Krüger wrote: This is consistent with what I am observing. Is this something that Oracle is aware of? Looking at Jira, I don't see that anyone is working on this: https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Open%2C%20%22In% 20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%20%20AND%20la bels%20in%20(performance) Given that one of the One of the main reasons to use JFX for me is to be able to develop with one code base for at least OSX and Windows and the official statement what JavaFX is for, i.e. JavaFX is a set of graphics and media packages that enables developers to design, create, test, debug, and deploy rich client applications that operate consistently across diverse platforms and the fact that this is clearly not the case currently (8u40) as soon as I do something else than simple forms, I run into performance/quality problems on the Mac, I am a bit unsure what to make of all that. Is Mac OSX a second-class citizen as far as dev resources are concerned? Tobi and Chris, have you filed Jira Issues on Mac graphics performance that can be tracked? I will file an issue with a simple test case and hope for the best. On Fri, Mar 27, 2015 at 11:08 PM, Chris Newland cnewl...@chrisnewland.com wrote: Possibly related: I can reproduce a massive (90%) performance drop on OSX between drawing a wireframe polygon on a Canvas using a series of gc.strokeLine(double x1, double y1, double x2, double y2) commands versus using a single gc.strokePolygon(double[] xPoints, double[] yPoints, int count) command. Creating the polygons manually with strokeLine() is significantly faster using the ES2Pipeline on OSX. This is reproducible in a little GitHub JavaFX benchmarking project I've created: https://github.com/chriswhocodes/DemoFX Build with ant Run with: # use strokeLine ./run.sh -c 5000 -m line result: 60 (sixty) fps # use strokePolygon ./run.sh -c 5000 -m poly result: 6 (six) fps System is 2011 iMac 27 / Mavericks / 3.4GHz Core i7 / 20GB RAM / Radeon 6970M 1024MB Looking at the code paths in javafx.scene.canvas.GraphicsContext: gc.strokeLine() maps to writeOp4(x1, y1, x2, y2, NGCanvas.STROKE_LINE) gc.strokePolygon() maps to writePoly(xPoints, yPoints, nPoints, true, NGCanvas.STROKE_PATH) which involves significantly more work with adding to and flushing a GrowableDataBuffer. I've not had time to dig any deeper than this but it's surely a bug when building a poly manually is 10x faster than using the convenience method. Cheers, Chris On Fri, March 27, 2015 21:26, Tobias Bley wrote: In my opinion the whole graphics performance on MacOSX isnââ¬â¢t good at all with JavaFXââ¬Â¦. Am 27.03.2015 um 22:10 schrieb Robert Krüger krue...@lesspain.de: The bad full screen performance is without the arcs. It is just one call to fillRect, two to strokeOval and one to fillOval, that's all. I will build a simple test case and file an issue. On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham james.gra...@oracle.com wrote: Hi Robert, Please file a Jira issue with a simple test case. Arcs are handled as a generalized shape rather than via a predetermined shader, but it shouldn't be that slow. Something else may be going on. Another test might be to replace the arcs with rectangles or ellipses and see if the performance changes... ...jim On 3/27/15 1:52 PM, Robert Krüger wrote: Hi, I have a super-simple animation implemented using AnimationTimer and Canvas where the canvas just performs a few draw operations, i.e. fills the screen with a color and then draws and fills 2-3 circles and I have already observed that each drawing operation I add, results in significant CPU load (e.g. when I draw 10 arcs in addition to the circles, the CPU load goes up to 30-40% on a Mac Book Pro for a Canvas size of 600x600(!). Now I tested the animation in full screen mode (only with a
Re: Canvas performance on Mac OS
On Sat, Mar 28, 2015 at 11:22 AM, Chris Newland cnewl...@chrisnewland.com wrote: Hi Robert, I've not filed a Jira yet as I was hoping to find time to investigate thoroughly but when I saw your question I thought I'd better add my findings. I believe the issue is in the ES2Pipeline as if I run with -Dprism.order=sw then strokePolygon outperforms the series of strokeLine commands as expected: java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m line Result: 44fps java -cp target/DemoFX.jar -Dprism.order=sw com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly Result: 60fps Will see if I can find the root cause as I've got plenty more examples where ES2Pipeline performs horribly on my Mac which should have no problem throwing around a few thousand polys. I realise there's a *lot* of indirection involved in making JavaFX support such a wide range of underlying graphics systems but I do think there's a bug here. Will file a Jira if I can contribute a bit more than feels slow ;) Cheers, Chris Great, thanks!
Re: Canvas performance on Mac OS
Hi Robert, Please file a Jira issue with a simple test case. Arcs are handled as a generalized shape rather than via a predetermined shader, but it shouldn't be that slow. Something else may be going on. Another test might be to replace the arcs with rectangles or ellipses and see if the performance changes... ...jim On 3/27/15 1:52 PM, Robert Krüger wrote: Hi, I have a super-simple animation implemented using AnimationTimer and Canvas where the canvas just performs a few draw operations, i.e. fills the screen with a color and then draws and fills 2-3 circles and I have already observed that each drawing operation I add, results in significant CPU load (e.g. when I draw 10 arcs in addition to the circles, the CPU load goes up to 30-40% on a Mac Book Pro for a Canvas size of 600x600(!). Now I tested the animation in full screen mode (only with a few circles) and playback is unusable for a serious application (very choppy). Is 2D canvas performance known to be very bad on Mac or am I doing something wrong? Are there workarounds for this? Thanks, Robert
Re: Canvas performance on Mac OS
The bad full screen performance is without the arcs. It is just one call to fillRect, two to strokeOval and one to fillOval, that's all. I will build a simple test case and file an issue. On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham james.gra...@oracle.com wrote: Hi Robert, Please file a Jira issue with a simple test case. Arcs are handled as a generalized shape rather than via a predetermined shader, but it shouldn't be that slow. Something else may be going on. Another test might be to replace the arcs with rectangles or ellipses and see if the performance changes... ...jim On 3/27/15 1:52 PM, Robert Krüger wrote: Hi, I have a super-simple animation implemented using AnimationTimer and Canvas where the canvas just performs a few draw operations, i.e. fills the screen with a color and then draws and fills 2-3 circles and I have already observed that each drawing operation I add, results in significant CPU load (e.g. when I draw 10 arcs in addition to the circles, the CPU load goes up to 30-40% on a Mac Book Pro for a Canvas size of 600x600(!). Now I tested the animation in full screen mode (only with a few circles) and playback is unusable for a serious application (very choppy). Is 2D canvas performance known to be very bad on Mac or am I doing something wrong? Are there workarounds for this? Thanks, Robert -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com
Re: Canvas performance on Mac OS
In my opinion the whole graphics performance on MacOSX isn’t good at all with JavaFX…. Am 27.03.2015 um 22:10 schrieb Robert Krüger krue...@lesspain.de: The bad full screen performance is without the arcs. It is just one call to fillRect, two to strokeOval and one to fillOval, that's all. I will build a simple test case and file an issue. On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham james.gra...@oracle.com wrote: Hi Robert, Please file a Jira issue with a simple test case. Arcs are handled as a generalized shape rather than via a predetermined shader, but it shouldn't be that slow. Something else may be going on. Another test might be to replace the arcs with rectangles or ellipses and see if the performance changes... ...jim On 3/27/15 1:52 PM, Robert Krüger wrote: Hi, I have a super-simple animation implemented using AnimationTimer and Canvas where the canvas just performs a few draw operations, i.e. fills the screen with a color and then draws and fills 2-3 circles and I have already observed that each drawing operation I add, results in significant CPU load (e.g. when I draw 10 arcs in addition to the circles, the CPU load goes up to 30-40% on a Mac Book Pro for a Canvas size of 600x600(!). Now I tested the animation in full screen mode (only with a few circles) and playback is unusable for a serious application (very choppy). Is 2D canvas performance known to be very bad on Mac or am I doing something wrong? Are there workarounds for this? Thanks, Robert -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com
Re: Canvas performance on Mac OS
Possibly related: I can reproduce a massive (90%) performance drop on OSX between drawing a wireframe polygon on a Canvas using a series of gc.strokeLine(double x1, double y1, double x2, double y2) commands versus using a single gc.strokePolygon(double[] xPoints, double[] yPoints, int count) command. Creating the polygons manually with strokeLine() is significantly faster using the ES2Pipeline on OSX. This is reproducible in a little GitHub JavaFX benchmarking project I've created: https://github.com/chriswhocodes/DemoFX Build with ant Run with: # use strokeLine ./run.sh -c 5000 -m line result: 60 (sixty) fps # use strokePolygon ./run.sh -c 5000 -m poly result: 6 (six) fps System is 2011 iMac 27 / Mavericks / 3.4GHz Core i7 / 20GB RAM / Radeon 6970M 1024MB Looking at the code paths in javafx.scene.canvas.GraphicsContext: gc.strokeLine() maps to writeOp4(x1, y1, x2, y2, NGCanvas.STROKE_LINE) gc.strokePolygon() maps to writePoly(xPoints, yPoints, nPoints, true, NGCanvas.STROKE_PATH) which involves significantly more work with adding to and flushing a GrowableDataBuffer. I've not had time to dig any deeper than this but it's surely a bug when building a poly manually is 10x faster than using the convenience method. Cheers, Chris On Fri, March 27, 2015 21:26, Tobias Bley wrote: In my opinion the whole graphics performance on MacOSX isnât good at all with JavaFXâ¦. Am 27.03.2015 um 22:10 schrieb Robert Krüger krue...@lesspain.de: The bad full screen performance is without the arcs. It is just one call to fillRect, two to strokeOval and one to fillOval, that's all. I will build a simple test case and file an issue. On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham james.gra...@oracle.com wrote: Hi Robert, Please file a Jira issue with a simple test case. Arcs are handled as a generalized shape rather than via a predetermined shader, but it shouldn't be that slow. Something else may be going on. Another test might be to replace the arcs with rectangles or ellipses and see if the performance changes... ...jim On 3/27/15 1:52 PM, Robert Krüger wrote: Hi, I have a super-simple animation implemented using AnimationTimer and Canvas where the canvas just performs a few draw operations, i.e. fills the screen with a color and then draws and fills 2-3 circles and I have already observed that each drawing operation I add, results in significant CPU load (e.g. when I draw 10 arcs in addition to the circles, the CPU load goes up to 30-40% on a Mac Book Pro for a Canvas size of 600x600(!). Now I tested the animation in full screen mode (only with a few circles) and playback is unusable for a serious application (very choppy). Is 2D canvas performance known to be very bad on Mac or am I doing something wrong? Are there workarounds for this? Thanks, Robert -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com