Hi Jim, Definitely discrete GPU on the iMac:
java -cp target/DemoFX.jar -Dprism.verbose=true com.chrisnewland.demofx.standalone.Sierpinski Prism pipeline init order: es2 sw Using native-based Pisces rasterizer Using dirty region optimizations Not using texture mask for primitives Not forcing power of 2 sizes for textures Using hardware CLAMP_TO_ZERO mode Opting in for HiDPI pixel scaling Prism pipeline name = com.sun.prism.es2.ES2Pipeline Loading ES2 native library ... prism_es2 succeeded. GLFactory using com.sun.prism.es2.MacGLFactory (X) Got class = class com.sun.prism.es2.ES2Pipeline Initialized prism pipeline: com.sun.prism.es2.ES2Pipeline Maximum supported texture size: 16384 Maximum texture size clamped to 4096 Non power of two texture support = true Maximum number of vertex attributes = 16 Maximum number of uniform vertex components = 3072 Maximum number of uniform fragment components = 3072 Maximum number of varying components = 128 Maximum number of texture units usable in a vertex shader = 16 Maximum number of texture units usable in a fragment shader = 16 Graphics Vendor: ATI Technologies Inc. Renderer: AMD Radeon HD 6970M OpenGL Engine Version: 2.1 ATI-1.24.38 vsync: true vpipe: true fps: 1 ES2ResourceFactory: Prism - createStockShader: Solid_Color.frag ES2ResourceFactory: Prism - createStockShader: FillPgram_Color.frag Loading Prism common native library ... succeeded. ES2ResourceFactory: Prism - createStockShader: Texture_Color.frag ES2ResourceFactory: Prism - createStockShader: Solid_TextureRGB.frag fps: 23 fps: 18 fps: 25 fps: 18 fps: 23 fps: 23 fps: 19 fps: 25 fps: 18 With software pipeline: java -cp target/DemoFX.jar -Dprism.verbose=true -Dprism.order=sw com.chrisnewland.demofx.standalone.Sierpinski Prism pipeline init order: sw Using native-based Pisces rasterizer Using dirty region optimizations Not using texture mask for primitives Not forcing power of 2 sizes for textures Using hardware CLAMP_TO_ZERO mode Opting in for HiDPI pixel scaling *** Fallback to Prism SW pipeline Prism pipeline name = com.sun.prism.sw.SWPipeline (X) Got class = class com.sun.prism.sw.SWPipeline Initialized prism pipeline: com.sun.prism.sw.SWPipeline vsync: true vpipe: false fps: 1 Loading Prism common native library ... succeeded. fps: 53 fps: 60 fps: 60 fps: 60 fps: 60 But earlier I got similar performance drop for es2 on a Linux system with discrete Nvidia graphics (see my previous email). I'll see if I can find a Windows box with discrete graphics to test if all platforms exhibit this behaviour. Cheers, Chris On Wed, April 8, 2015 00:16, Jim Graham wrote: > OK, I took the time to put my rMBP on a diet yesterday and find room to > install a 10.10 partition. I get the same numbers for Sierpinski on 10.10, > so my theory that something changed in the OGL implementation for 10.10 > doesn't hold water. > > But, I then tried it using the integrated graphics. I get really poor > performance using the integrated Intel 4000 graphics, but I get great > numbers on the discrete nVidia 650m. It makes sense that the Intel > graphics wouldn't be as powerful as the discrete graphics, but we > shouldn't be taxing it that much to make that big of a difference. > > Just to be sure - is that iMac a dual graphics system, or is it > all-AMD-all-the-time? You can see which GPU is being used if you run it > with -Dprism.verbose=true... > > ...jim > > > On 4/2/15 4:13 PM, Jim Graham wrote: > >> On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you >> running a newer version of MacOS? >> >> ...jim >> >> >> On 3/31/15 3:40 PM, Chris Newland wrote: >> >>> Hi Hervé, >>> >>> >>> That's a valid question :) >>> >>> >>> Probably because >>> >>> >>> a) All my non-UI graphics experience is with immediate-mode / raster >>> systems >>> >>> b) I'm interested in using JavaFX for particle effects / demoscene / >>> gaming so assumed (perhaps wrongly?) that scenegraph was not the way >>> to go for that due to the very large number of nodes. >>> >>> Numbers for my Sierpinski filled triangle example: >>> >>> >>> System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M >>> 1024 MB >>> >>> >>> java -Dprism.order=es2 -cp target/classes/ >>> com.chrisnewland.demofx.standalone.Sierpinski fps: 1 >>> fps: 23 >>> fps: 18 >>> fps: 25 >>> fps: 18 >>> fps: 23 >>> fps: 23 >>> fps: 19 >>> fps: 25 >>> >>> >>> java -Dprism.order=sw -cp target/classes/ >>> com.chrisnewland.demofx.standalone.Sierpinski fps: 1 >>> fps: 54 >>> fps: 60 >>> fps: 60 >>> fps: 60 >>> fps: 60 >>> fps: 60 >>> fps: 60 >>> fps: 60 >>> fps: 60 >>> fps: 60 >>> >>> >>> There are never more than 2500 filled triangles on screen. JDK is >>> 1.8.0_40 >>> >>> >>> I would say there is a performance problem here? (or at least a need >>> for documentation so as to set expectations for gc.fillPolygon). >>> >>> Best regards, >>> >>> >>> Chris >>> >>> >>> >>> >>> >>> On Tue, March 31, 2015 22:00, Hervé Girod wrote: >>> >>>> Why don't you use Nodes rather than Canvas ? >>>> >>>> >>>> >>>> Sent from my iPhone >>>> >>>> >>>> >>>>> On Mar 31, 2015, at 22:31, Chris Newland >>>>> <cnewl...@chrisnewland.com> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> Hi Jim, >>>>> >>>>> >>>>> >>>>> Thanks, that makes things much clearer. >>>>> >>>>> >>>>> >>>>> I was surprised how much was going on under the hood of >>>>> GraphicsContext >>>>> and hoped it was just magic glue that gave the best of GPU >>>>> acceleration where available and immediate-mode-like simple >>>>> rasterizing where not. >>>>> >>>>> I've managed to find an anomaly with GraphicsContext.fillPolygon >>>>> where the software pipeline achieves the full 60fps but ES2 can >>>>> only manage 30-35fps. It uses lots of overlapping filled triangles >>>>> so I expect suffers from the problem you've described. >>>>> >>>>> SSCCE: >>>>> https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/ >>>>> com/ch >>>>> >>>>> risnewland/demofx/standalone/Sierpinski.java >>>>> >>>>> Was full frame rate canvas drawing an expected use case for >>>>> JavaFX or >>>>> would I be better off with Graphics2D? >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>>> On Mon, March 30, 2015 20:04, Jim Graham wrote: >>>>>> Hi Chris, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> drawLine() is a very simple primitive that can be optimized >>>>>> with a GPU >>>>>> shader. It either looks like a (potentially rotated) rectangle >>>>>> or a rounded rect - and we have optimized shaders for both >>>>>> cases. A large number of drawLine() calls turns into simply >>>>>> accumulating a large vertex list and uploading it to the GPU >>>>>> with an appropriate shader which is very fast. >>>>>> >>>>>> drawPolygon() is a very complex operation that involves things >>>>>> like: >>>>>> >>>>>> >>>>>> - dealing with line joins between segments that don't exist for >>>>>> drawLine() - dealing with only rendering common points of >>>>>> intersection once >>>>>> >>>>>> To handle all of that complexity we have to involve a >>>>>> rasterizer that takes the entire collection of lines, analyzes >>>>>> the stroke attributes and interactions and computes a coverage >>>>>> mask for each pixel in the region. We do that in software >>>>>> currently for all pipelines. >>>>>> >>>>>> For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs >>>>>> CPU path >>>>>> rasterization. >>>>>> >>>>>> For the SW pipeline, drawLine is a simplified case of >>>>>> drawPolygon and so the overhead of lots of calls to drawLine() >>>>>> dominates its performance. >>>>>> >>>>>> I would expect ES2 to blow the SW pipeline out of the water >>>>>> with drawLine() performance (as long as there are no additional >>>>>> rendering primitives interspersed in the set of lines). >>>>>> >>>>>> But, both should be on the same footing for the drawPolygon >>>>>> case. Does >>>>>> the ES2 pipeline compare similarly (hopefully better than) the >>>>>> SW >>>>>> pipeline for the polygon case? >>>>>> >>>>>> One thing I noticed is that we have no optimized case for >>>>>> drawLine() on the SW pipeline. It generates a path containing a >>>>>> single MOVETO and LINETO and feeds it to the generalized path >>>>>> rasterizer when it could instead compute the rounded/square >>>>>> rectangle and render it more directly. If we added that support >>>>>> then I'd expect the SW pipeline to perform the set of drawLine >>>>>> calls faster than drawPolygon as well... >>>>>> >>>>>> ...jim >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On 3/28/15 3:22 AM, Chris Newland wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi Robert, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I've not filed a Jira yet as I was hoping to find time to >>>>>>> investigate thoroughly but when I saw your question I thought >>>>>>> I'd >>>>>>> better add my findings. >>>>>>> >>>>>>> I believe the issue is in the ES2Pipeline as if I run with >>>>>>> -Dprism.order=sw then strokePolygon outperforms the series of >>>>>>> strokeLine commands as expected: >>>>>>> >>>>>>> java -cp target/DemoFX.jar -Dprism.order=sw >>>>>>> com.chrisnewland.demofx.DemoFXApplication -c 500 -m line >>>>>>> Result: >>>>>>> 44fps >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> java -cp target/DemoFX.jar -Dprism.order=sw >>>>>>> com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly >>>>>>> Result: >>>>>>> 60fps >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Will see if I can find the root cause as I've got plenty more >>>>>>> examples where ES2Pipeline performs horribly on my Mac which >>>>>>> should have no problem throwing around a few thousand polys. >>>>>>> >>>>>>> I realise there's a *lot* of indirection involved in making >>>>>>> JavaFX >>>>>>> support such a wide range of underlying graphics systems but I >>>>>>> do think there's a bug here. >>>>>>> >>>>>>> Will file a Jira if I can contribute a bit more than "feels >>>>>>> slow" ;) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Sat, March 28, 2015 10:06, Robert Krüger wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> This is consistent with what I am observing. Is this >>>>>>>> something that Oracle is aware of? Looking at Jira, I don't >>>>>>>> see that anyone is working on this: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Op >>>>>>>> en%2C% 20%2 >>>>>>>> 2In% >>>>>>>> 20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)% >>>>>>>> 20%20A >>>>>>>> ND%2 >>>>>>>> 0la >>>>>>>> bels%20in%20(performance) >>>>>>>> >>>>>>>> Given that one of the One of the main reasons to use JFX >>>>>>>> for me is to be able to develop with one code base for at >>>>>>>> least OSX and Windows and >>>>>>>> the official statement what JavaFX is for, i.e. >>>>>>>> >>>>>>>> "JavaFX is a set of graphics and media packages that >>>>>>>> enables developers to design, create, test, debug, and >>>>>>>> deploy rich client applications that operate consistently >>>>>>>> across diverse platforms" >>>>>>>> >>>>>>>> and the fact that this is clearly not the case currently >>>>>>>> (8u40) >>>>>>>> as soon as I do something else than simple forms, I run into >>>>>>>> performance/quality problems on the Mac, I am a bit unsure >>>>>>>> what to make of all that. Is Mac OSX a second-class citizen >>>>>>>> as far as dev resources are concerned? >>>>>>>> >>>>>>>> Tobi and Chris, have you filed Jira Issues on Mac graphics >>>>>>>> performance that can be tracked? >>>>>>>> >>>>>>>> I will file an issue with a simple test case and hope for >>>>>>>> the best. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Mar 27, 2015 at 11:08 PM, Chris Newland >>>>>>>> <cnewl...@chrisnewland.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Possibly related: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I can reproduce a massive (90%) performance drop on OSX >>>>>>>>> between drawing a wireframe polygon on a Canvas using a >>>>>>>>> series of gc.strokeLine(double x1, double y1, double x2, >>>>>>>>> double y2) commands versus using a single >>>>>>>>> gc.strokePolygon(double[] xPoints, double[] yPoints, int >>>>>>>>> count) command. >>>>>>>>> >>>>>>>>> Creating the polygons manually with strokeLine() is >>>>>>>>> significantly faster using the ES2Pipeline on OSX. >>>>>>>>> >>>>>>>>> This is reproducible in a little GitHub JavaFX >>>>>>>>> benchmarking project I've created: >>>>>>>>> https://github.com/chriswhocodes/DemoFX >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Build with ant >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Run with: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> # use strokeLine >>>>>>>>> ./run.sh -c 5000 -m line >>>>>>>>> result: 60 (sixty) fps >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> # use strokePolygon >>>>>>>>> ./run.sh -c 5000 -m poly >>>>>>>>> result: 6 (six) fps >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> System is 2011 iMac 27" / Mavericks / 3.4GHz Core i7 / >>>>>>>>> 20GB RAM >>>>>>>>> / >>>>>>>>> Radeon >>>>>>>>> 6970M 1024MB >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Looking at the code paths in >>>>>>>>> javafx.scene.canvas.GraphicsContext: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> gc.strokeLine() maps to writeOp4(x1, y1, x2, y2, >>>>>>>>> NGCanvas.STROKE_LINE) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> gc.strokePolygon() maps to writePoly(xPoints, yPoints, >>>>>>>>> nPoints, true, NGCanvas.STROKE_PATH) which involves >>>>>>>>> significantly more work with adding to and flushing a >>>>>>>>> GrowableDataBuffer. >>>>>>>>> >>>>>>>>> >>>>>>>>> I've not had time to dig any deeper than this but it's >>>>>>>>> surely a bug when building a poly manually is 10x faster >>>>>>>>> than using the convenience method. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Chris >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, March 27, 2015 21:26, Tobias Bley wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> In my opinion the whole graphics performance on MacOSX >>>>>>>>>> isnââ¬â¢t good at all with JavaFXââ¬Â¦. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Am 27.03.2015 um 22:10 schrieb Robert Krüger >>>>>>>>>>> <krue...@lesspain.de>: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The bad full screen performance is without the arcs. >>>>>>>>>>> It is >>>>>>>>>>> just one call to fillRect, two to strokeOval and one >>>>>>>>>>> to fillOval, that's all. I will build a simple test >>>>>>>>>>> case and file an issue. >>>>>>>>>>> >>>>>>>>>>> On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham >>>>>>>>>>> <james.gra...@oracle.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hi Robert, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Please file a Jira issue with a simple test case. >>>>>>>>>>>> Arcs >>>>>>>>>>>> are handled as a generalized shape rather than via a >>>>>>>>>>>> predetermined shader, but it shouldn't be that >>>>>>>>>>>> slow. Something else may >>>>>>>>>>>> be going on. >>>>>>>>>>>> >>>>>>>>>>>> Another test might be to replace the arcs with >>>>>>>>>>>> rectangles or ellipses and see if the performance >>>>>>>>>>>> changes... >>>>>>>>>>>> >>>>>>>>>>>> ...jim >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 3/27/15 1:52 PM, Robert Krüger wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I have a super-simple animation implemented using >>>>>>>>>>>>> AnimationTimer >>>>>>>>>>>>> and Canvas where the canvas just performs a few >>>>>>>>>>>>> draw operations, i.e. fills the screen with a >>>>>>>>>>>>> color and then draws and fills 2-3 circles and I >>>>>>>>>>>>> have already observed that each drawing operation >>>>>>>>>>>>> I add, results in >>>>>>>>>>>>> significant CPU load (e.g. when I draw < 10 arcs >>>>>>>>>>>>> in addition to the circles, the CPU load goes up >>>>>>>>>>>>> to 30-40% on a Mac Book Pro for a Canvas size of >>>>>>>>>>>>> 600x600(!). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Now I tested the animation in full screen mode >>>>>>>>>>>>> (only >>>>>>>>>>>>> with a few circles) and playback is unusable for a >>>>>>>>>>>>> serious application (very choppy). Is 2D canvas >>>>>>>>>>>>> performance known to be very bad on Mac or am I >>>>>>>>>>>>> doing something wrong? Are there workarounds for >>>>>>>>>>>>> this? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Robert >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Robert Krüger >>>>>>>>>>> Managing Partner >>>>>>>>>>> Lesspain GmbH & Co. KG >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> www.lesspain-software.com >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Robert Krüger >>>>>>>> Managing Partner >>>>>>>> Lesspain GmbH & Co. KG >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> www.lesspain-software.com >>>>> >>>>> >>>> >>> >>> >