Hi Jim,

Definitely discrete GPU on the iMac:

java -cp target/DemoFX.jar -Dprism.verbose=true
com.chrisnewland.demofx.standalone.Sierpinski

Prism pipeline init order: es2 sw
Using native-based Pisces rasterizer
Using dirty region optimizations
Not using texture mask for primitives
Not forcing power of 2 sizes for textures
Using hardware CLAMP_TO_ZERO mode
Opting in for HiDPI pixel scaling
Prism pipeline name = com.sun.prism.es2.ES2Pipeline
Loading ES2 native library ... prism_es2
        succeeded.
GLFactory using com.sun.prism.es2.MacGLFactory
(X) Got class = class com.sun.prism.es2.ES2Pipeline
Initialized prism pipeline: com.sun.prism.es2.ES2Pipeline
Maximum supported texture size: 16384
Maximum texture size clamped to 4096
Non power of two texture support = true
Maximum number of vertex attributes = 16
Maximum number of uniform vertex components = 3072
Maximum number of uniform fragment components = 3072
Maximum number of varying components = 128
Maximum number of texture units usable in a vertex shader = 16
Maximum number of texture units usable in a fragment shader = 16
Graphics Vendor: ATI Technologies Inc.
       Renderer: AMD Radeon HD 6970M OpenGL Engine
        Version: 2.1 ATI-1.24.38
 vsync: true vpipe: true
fps: 1
ES2ResourceFactory: Prism - createStockShader: Solid_Color.frag
ES2ResourceFactory: Prism - createStockShader: FillPgram_Color.frag
Loading Prism common native library ...
        succeeded.
ES2ResourceFactory: Prism - createStockShader: Texture_Color.frag
ES2ResourceFactory: Prism - createStockShader: Solid_TextureRGB.frag
fps: 23
fps: 18
fps: 25
fps: 18
fps: 23
fps: 23
fps: 19
fps: 25
fps: 18

With software pipeline:

java -cp target/DemoFX.jar -Dprism.verbose=true -Dprism.order=sw
com.chrisnewland.demofx.standalone.Sierpinski

Prism pipeline init order: sw
Using native-based Pisces rasterizer
Using dirty region optimizations
Not using texture mask for primitives
Not forcing power of 2 sizes for textures
Using hardware CLAMP_TO_ZERO mode
Opting in for HiDPI pixel scaling
*** Fallback to Prism SW pipeline
Prism pipeline name = com.sun.prism.sw.SWPipeline
(X) Got class = class com.sun.prism.sw.SWPipeline
Initialized prism pipeline: com.sun.prism.sw.SWPipeline
 vsync: true vpipe: false
fps: 1
Loading Prism common native library ...
        succeeded.
fps: 53
fps: 60
fps: 60
fps: 60
fps: 60

But earlier I got similar performance drop for es2 on a Linux system with
discrete Nvidia graphics (see my previous email).

I'll see if I can find a Windows box with discrete graphics to test if all
platforms exhibit this behaviour.

Cheers,

Chris


On Wed, April 8, 2015 00:16, Jim Graham wrote:
> OK, I took the time to put my rMBP on a diet yesterday and find room to
> install a 10.10 partition.  I get the same numbers for Sierpinski on 10.10,
> so my theory that something changed in the OGL implementation for 10.10
> doesn't hold water.
>
> But, I then tried it using the integrated graphics.  I get really poor
> performance using the integrated Intel 4000 graphics, but I get great
> numbers on the discrete nVidia 650m.  It makes sense that the Intel
> graphics wouldn't be as powerful as the discrete graphics, but we
> shouldn't be taxing it that much to make that big of a difference.
>
> Just to be sure - is that iMac a dual graphics system, or is it
> all-AMD-all-the-time?  You can see which GPU is being used if you run it
> with -Dprism.verbose=true...
>
> ...jim
>
>
> On 4/2/15 4:13 PM, Jim Graham wrote:
>
>> On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw.  Are you
>> running a newer version of MacOS?
>>
>> ...jim
>>
>>
>> On 3/31/15 3:40 PM, Chris Newland wrote:
>>
>>> Hi Hervé,
>>>
>>>
>>> That's a valid question :)
>>>
>>>
>>> Probably because
>>>
>>>
>>> a) All my non-UI graphics experience is with immediate-mode / raster
>>> systems
>>>
>>> b) I'm interested in using JavaFX for particle effects / demoscene /
>>> gaming so assumed (perhaps wrongly?) that scenegraph was not the way
>>> to go for that due to the very large number of nodes.
>>>
>>> Numbers for my Sierpinski filled triangle example:
>>>
>>>
>>> System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M
>>> 1024 MB
>>>
>>>
>>> java -Dprism.order=es2 -cp target/classes/
>>> com.chrisnewland.demofx.standalone.Sierpinski fps: 1
>>> fps: 23
>>> fps: 18
>>> fps: 25
>>> fps: 18
>>> fps: 23
>>> fps: 23
>>> fps: 19
>>> fps: 25
>>>
>>>
>>> java -Dprism.order=sw -cp target/classes/
>>> com.chrisnewland.demofx.standalone.Sierpinski fps: 1
>>> fps: 54
>>> fps: 60
>>> fps: 60
>>> fps: 60
>>> fps: 60
>>> fps: 60
>>> fps: 60
>>> fps: 60
>>> fps: 60
>>> fps: 60
>>>
>>>
>>> There are never more than 2500 filled triangles on screen. JDK is
>>> 1.8.0_40
>>>
>>>
>>> I would say there is a performance problem here? (or at least a need
>>> for documentation so as to set expectations for gc.fillPolygon).
>>>
>>> Best regards,
>>>
>>>
>>> Chris
>>>
>>>
>>>
>>>
>>>
>>> On Tue, March 31, 2015 22:00, Hervé Girod wrote:
>>>
>>>> Why don't you use Nodes rather than Canvas ?
>>>>
>>>>
>>>>
>>>> Sent from my iPhone
>>>>
>>>>
>>>>
>>>>> On Mar 31, 2015, at 22:31, Chris Newland
>>>>> <cnewl...@chrisnewland.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hi Jim,
>>>>>
>>>>>
>>>>>
>>>>> Thanks, that makes things much clearer.
>>>>>
>>>>>
>>>>>
>>>>> I was surprised how much was going on under the hood of
>>>>> GraphicsContext
>>>>> and hoped it was just magic glue that gave the best of GPU
>>>>> acceleration where available and immediate-mode-like simple
>>>>> rasterizing where not.
>>>>>
>>>>> I've managed to find an anomaly with GraphicsContext.fillPolygon
>>>>> where the software pipeline achieves the full 60fps but ES2 can
>>>>> only manage 30-35fps. It uses lots of overlapping filled triangles
>>>>> so I expect suffers from the problem you've described.
>>>>>
>>>>> SSCCE:
>>>>> https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/
>>>>> com/ch
>>>>>
>>>>> risnewland/demofx/standalone/Sierpinski.java
>>>>>
>>>>> Was full frame rate canvas drawing an expected use case for
>>>>> JavaFX or
>>>>> would I be better off with Graphics2D?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>> On Mon, March 30, 2015 20:04, Jim Graham wrote:
>>>>>> Hi Chris,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> drawLine() is a very simple primitive that can be optimized
>>>>>> with a GPU
>>>>>> shader.  It either looks like a (potentially rotated) rectangle
>>>>>> or a rounded rect - and we have optimized shaders for both
>>>>>> cases.  A large number of drawLine() calls turns into simply
>>>>>> accumulating a large vertex list and uploading it to the GPU
>>>>>> with an appropriate shader which is very fast.
>>>>>>
>>>>>> drawPolygon() is a very complex operation that involves things
>>>>>> like:
>>>>>>
>>>>>>
>>>>>> - dealing with line joins between segments that don't exist for
>>>>>>  drawLine() - dealing with only rendering common points of
>>>>>> intersection once
>>>>>>
>>>>>> To handle all of that complexity we have to involve a
>>>>>> rasterizer that takes the entire collection of lines, analyzes
>>>>>> the stroke attributes and interactions and computes a coverage
>>>>>> mask for each pixel in the region. We do that in software
>>>>>> currently for all pipelines.
>>>>>>
>>>>>> For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs
>>>>>> CPU path
>>>>>> rasterization.
>>>>>>
>>>>>> For the SW pipeline, drawLine is a simplified case of
>>>>>> drawPolygon and so the overhead of lots of calls to drawLine()
>>>>>> dominates its performance.
>>>>>>
>>>>>> I would expect ES2 to blow the SW pipeline out of the water
>>>>>> with drawLine() performance (as long as there are no additional
>>>>>> rendering primitives interspersed in the set of lines).
>>>>>>
>>>>>> But, both should be on the same footing for the drawPolygon
>>>>>> case. Does
>>>>>> the ES2 pipeline compare similarly (hopefully better than) the
>>>>>> SW
>>>>>> pipeline for the polygon case?
>>>>>>
>>>>>> One thing I noticed is that we have no optimized case for
>>>>>> drawLine() on the SW pipeline.  It generates a path containing a
>>>>>> single MOVETO and LINETO and feeds it to the generalized path
>>>>>> rasterizer when it could instead compute the rounded/square
>>>>>> rectangle and render it more directly.  If we added that support
>>>>>> then I'd expect the SW pipeline to perform the set of drawLine
>>>>>> calls faster than drawPolygon as well...
>>>>>>
>>>>>> ...jim
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 3/28/15 3:22 AM, Chris Newland wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Robert,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I've not filed a Jira yet as I was hoping to find time to
>>>>>>> investigate thoroughly but when I saw your question I thought
>>>>>>> I'd
>>>>>>> better add my findings.
>>>>>>>
>>>>>>> I believe the issue is in the ES2Pipeline as if I run with
>>>>>>> -Dprism.order=sw then strokePolygon outperforms the series of
>>>>>>> strokeLine commands as expected:
>>>>>>>
>>>>>>> java -cp target/DemoFX.jar -Dprism.order=sw
>>>>>>> com.chrisnewland.demofx.DemoFXApplication -c 500 -m line
>>>>>>> Result:
>>>>>>> 44fps
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> java -cp target/DemoFX.jar -Dprism.order=sw
>>>>>>> com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly
>>>>>>> Result:
>>>>>>> 60fps
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Will see if I can find the root cause as I've got plenty more
>>>>>>>  examples where ES2Pipeline performs horribly on my Mac which
>>>>>>> should have no problem throwing around a few thousand polys.
>>>>>>>
>>>>>>> I realise there's a *lot* of indirection involved in making
>>>>>>> JavaFX
>>>>>>> support such a wide range of underlying graphics systems but I
>>>>>>> do think there's a bug here.
>>>>>>>
>>>>>>> Will file a Jira if I can contribute a bit more than "feels
>>>>>>> slow" ;)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Sat, March 28, 2015 10:06, Robert Krüger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This is consistent with what I am observing. Is this
>>>>>>>> something that Oracle is aware of? Looking at Jira, I don't
>>>>>>>> see that anyone is working on this:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Op
>>>>>>>> en%2C% 20%2
>>>>>>>> 2In%
>>>>>>>> 20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%
>>>>>>>> 20%20A
>>>>>>>> ND%2
>>>>>>>> 0la
>>>>>>>> bels%20in%20(performance)
>>>>>>>>
>>>>>>>> Given that one of the One of the main reasons to use JFX
>>>>>>>> for me is to be able to develop with one code base for at
>>>>>>>> least OSX and Windows and
>>>>>>>> the official statement what JavaFX is for, i.e.
>>>>>>>>
>>>>>>>> "JavaFX is a set of graphics and media packages that
>>>>>>>> enables developers to design, create, test, debug, and
>>>>>>>> deploy rich client applications that operate consistently
>>>>>>>> across diverse platforms"
>>>>>>>>
>>>>>>>> and the fact that this is clearly not the case currently
>>>>>>>> (8u40)
>>>>>>>> as soon as I do something else than simple forms, I run into
>>>>>>>>  performance/quality problems on the Mac, I am a bit unsure
>>>>>>>> what to make of all that. Is Mac OSX a second-class citizen
>>>>>>>> as far as dev resources are concerned?
>>>>>>>>
>>>>>>>> Tobi and Chris, have you filed Jira Issues on Mac graphics
>>>>>>>> performance that can be tracked?
>>>>>>>>
>>>>>>>> I will file an issue with a simple test case and hope for
>>>>>>>> the best.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 27, 2015 at 11:08 PM, Chris Newland
>>>>>>>> <cnewl...@chrisnewland.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Possibly related:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can reproduce a massive (90%) performance drop on OSX
>>>>>>>>> between drawing a wireframe polygon on a Canvas using a
>>>>>>>>> series of gc.strokeLine(double x1, double y1, double x2,
>>>>>>>>> double y2) commands versus using a single
>>>>>>>>> gc.strokePolygon(double[] xPoints, double[] yPoints, int
>>>>>>>>> count) command.
>>>>>>>>>
>>>>>>>>> Creating the polygons manually with strokeLine() is
>>>>>>>>> significantly faster using the ES2Pipeline on OSX.
>>>>>>>>>
>>>>>>>>> This is reproducible in a little GitHub JavaFX
>>>>>>>>> benchmarking project I've created:
>>>>>>>>> https://github.com/chriswhocodes/DemoFX
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Build with ant
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Run with:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> # use strokeLine
>>>>>>>>> ./run.sh -c 5000 -m line
>>>>>>>>> result: 60 (sixty) fps
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> # use strokePolygon
>>>>>>>>> ./run.sh -c 5000 -m poly
>>>>>>>>> result: 6 (six) fps
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> System is 2011 iMac 27" / Mavericks / 3.4GHz Core i7 /
>>>>>>>>> 20GB RAM
>>>>>>>>> /
>>>>>>>>> Radeon
>>>>>>>>> 6970M 1024MB
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looking at the code paths in
>>>>>>>>> javafx.scene.canvas.GraphicsContext:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> gc.strokeLine() maps to writeOp4(x1, y1, x2, y2,
>>>>>>>>> NGCanvas.STROKE_LINE)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> gc.strokePolygon() maps to writePoly(xPoints, yPoints,
>>>>>>>>> nPoints, true, NGCanvas.STROKE_PATH) which involves
>>>>>>>>> significantly more work with adding to and flushing a
>>>>>>>>> GrowableDataBuffer.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've not had time to dig any deeper than this but it's
>>>>>>>>> surely a bug when building a poly manually is 10x faster
>>>>>>>>> than using the convenience method.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, March 27, 2015 21:26, Tobias Bley wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> In my opinion the whole graphics performance on MacOSX
>>>>>>>>>> isn’t good at all with JavaFX….
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Am 27.03.2015 um 22:10 schrieb Robert Krüger
>>>>>>>>>>> <krue...@lesspain.de>:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The bad full screen performance is without the arcs.
>>>>>>>>>>> It is
>>>>>>>>>>> just one call to fillRect, two to strokeOval and one
>>>>>>>>>>> to fillOval, that's all. I will build a simple test
>>>>>>>>>>> case and file an issue.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham
>>>>>>>>>>> <james.gra...@oracle.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Hi Robert,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Please file a Jira issue with a simple test case.
>>>>>>>>>>>> Arcs
>>>>>>>>>>>> are handled as a generalized shape rather than via a
>>>>>>>>>>>>  predetermined shader, but it shouldn't be that
>>>>>>>>>>>> slow. Something else may
>>>>>>>>>>>> be going on.
>>>>>>>>>>>>
>>>>>>>>>>>> Another test might be to replace the arcs with
>>>>>>>>>>>> rectangles or ellipses and see if the performance
>>>>>>>>>>>> changes...
>>>>>>>>>>>>
>>>>>>>>>>>> ...jim
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/27/15 1:52 PM, Robert Krüger wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a super-simple animation implemented using
>>>>>>>>>>>>>  AnimationTimer
>>>>>>>>>>>>> and Canvas where the canvas just performs a few
>>>>>>>>>>>>> draw operations, i.e. fills the screen with a
>>>>>>>>>>>>> color and then draws and fills 2-3 circles and I
>>>>>>>>>>>>> have already observed that each drawing operation
>>>>>>>>>>>>> I add, results in
>>>>>>>>>>>>> significant CPU load (e.g. when I draw < 10 arcs
>>>>>>>>>>>>> in addition to the circles, the CPU load goes up
>>>>>>>>>>>>> to 30-40% on a Mac Book Pro for a Canvas size of
>>>>>>>>>>>>> 600x600(!).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now I tested the animation in full screen mode
>>>>>>>>>>>>> (only
>>>>>>>>>>>>> with a few circles) and playback is unusable for a
>>>>>>>>>>>>>  serious application (very choppy). Is 2D canvas
>>>>>>>>>>>>> performance known to be very bad on Mac or am I
>>>>>>>>>>>>> doing something wrong? Are there workarounds for
>>>>>>>>>>>>> this?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Robert
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Robert Krüger
>>>>>>>>>>> Managing Partner
>>>>>>>>>>> Lesspain GmbH & Co. KG
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> www.lesspain-software.com
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Robert Krüger
>>>>>>>> Managing Partner
>>>>>>>> Lesspain GmbH & Co. KG
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> www.lesspain-software.com
>>>>>
>>>>>
>>>>
>>>
>>>
>


Reply via email to