Is that on Sierpinski? The main question I still have is whether this
helps other apps. The usage pattern in Sierpinski is fairly specific
and may not translate to all apps (we're still planning on doing this as
it is an easy improvement that doesn't seem to have a down-side, but I
don't want to break out any champagne if it is just this one benchmark)...
...jim
On 4/16/15 12:58 PM, Johan Vos wrote:
Hi Jim,
On iOS, the performance jumped from 2 fps to 15 fps on my old iPad.
Excellent work!
- Johan
2015-04-14 21:16 GMT+02:00 Jim Graham <james.gra...@oracle.com>:
Hi Chris,
We identified a fairly localized optimization that we might be able to
apply to enhance the performance of your Sierpinski program. We don't have
any figures yet on whether this will improve other applications/benchmarks
that people have been discussing, but the improvements with your Sierpinski
program are quite dramatic on a number of platforms and GPUs.
This issue is now being tracked as: https://javafx-jira.kenai.com/
browse/RT-40533
If others could apply the indicated patch to an OpenJFX build and provide
feedback on any improvements (or bugs!) that they see, that would help. In
the meantime, we have a lot of testing to do to verify the correctness of
the changes...
...jim
On 4/8/15 9:25 AM, Chris Newland wrote:
Hi Jim,
I'll post the verbose prism output from my iMac when I get home.
Just tried this on my Linux workstation and the performance gap is the
same between es2 and sw so I don't think it's an OSX issue.
uname -a
Linux chris 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux
"$JAVA_HOME/bin/java" -classpath target/DemoFX.jar
com.chrisnewland.demofx.standalone.Sierpinski
fps: 1
fps: 20
fps: 31
fps: 32
fps: 33
fps: 35
fps: 34
fps: 33
"$JAVA_HOME/bin/java" -Dprism.order=sw -classpath target/DemoFX.jar
com.chrisnewland.demofx.standalone.Sierpinski
fps: 1
fps: 54
fps: 56
fps: 60
fps: 59
fps: 60
fps: 61
fps: 61
fps: 60
This is a Xeon W3520 quad-core HT box with an Nvidia Quadro FX 580
graphics card running driver 304.125
Regards,
Chris
On Wed, April 8, 2015 00:16, Jim Graham wrote:
OK, I took the time to put my rMBP on a diet yesterday and find room to
install a 10.10 partition. I get the same numbers for Sierpinski on
10.10,
so my theory that something changed in the OGL implementation for 10.10
doesn't hold water.
But, I then tried it using the integrated graphics. I get really poor
performance using the integrated Intel 4000 graphics, but I get great
numbers on the discrete nVidia 650m. It makes sense that the Intel
graphics wouldn't be as powerful as the discrete graphics, but we
shouldn't be taxing it that much to make that big of a difference.
Just to be sure - is that iMac a dual graphics system, or is it
all-AMD-all-the-time? You can see which GPU is being used if you run it
with -Dprism.verbose=true...
...jim
On 4/2/15 4:13 PM, Jim Graham wrote:
On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you
running a newer version of MacOS?
...jim
On 3/31/15 3:40 PM, Chris Newland wrote:
Hi Hervé,
That's a valid question :)
Probably because
a) All my non-UI graphics experience is with immediate-mode / raster
systems
b) I'm interested in using JavaFX for particle effects / demoscene /
gaming so assumed (perhaps wrongly?) that scenegraph was not the way
to go for that due to the very large number of nodes.
Numbers for my Sierpinski filled triangle example:
System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M
1024 MB
java -Dprism.order=es2 -cp target/classes/
com.chrisnewland.demofx.standalone.Sierpinski fps: 1
fps: 23
fps: 18
fps: 25
fps: 18
fps: 23
fps: 23
fps: 19
fps: 25
java -Dprism.order=sw -cp target/classes/
com.chrisnewland.demofx.standalone.Sierpinski fps: 1
fps: 54
fps: 60
fps: 60
fps: 60
fps: 60
fps: 60
fps: 60
fps: 60
fps: 60
fps: 60
There are never more than 2500 filled triangles on screen. JDK is
1.8.0_40
I would say there is a performance problem here? (or at least a need
for documentation so as to set expectations for gc.fillPolygon).
Best regards,
Chris
On Tue, March 31, 2015 22:00, Hervé Girod wrote:
Why don't you use Nodes rather than Canvas ?
Sent from my iPhone
On Mar 31, 2015, at 22:31, Chris Newland
<cnewl...@chrisnewland.com>
wrote:
Hi Jim,
Thanks, that makes things much clearer.
I was surprised how much was going on under the hood of
GraphicsContext
and hoped it was just magic glue that gave the best of GPU
acceleration where available and immediate-mode-like simple
rasterizing where not.
I've managed to find an anomaly with GraphicsContext.fillPolygon
where the software pipeline achieves the full 60fps but ES2 can
only manage 30-35fps. It uses lots of overlapping filled triangles
so I expect suffers from the problem you've described.
SSCCE:
https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/
com/ch
risnewland/demofx/standalone/Sierpinski.java
Was full frame rate canvas drawing an expected use case for
JavaFX or
would I be better off with Graphics2D?
Thanks,
Chris
On Mon, March 30, 2015 20:04, Jim Graham wrote:
Hi Chris,
drawLine() is a very simple primitive that can be optimized
with a GPU
shader. It either looks like a (potentially rotated) rectangle
or a rounded rect - and we have optimized shaders for both
cases. A large number of drawLine() calls turns into simply
accumulating a large vertex list and uploading it to the GPU
with an appropriate shader which is very fast.
drawPolygon() is a very complex operation that involves things
like:
- dealing with line joins between segments that don't exist for
drawLine() - dealing with only rendering common points of
intersection once
To handle all of that complexity we have to involve a
rasterizer that takes the entire collection of lines, analyzes
the stroke attributes and interactions and computes a coverage
mask for each pixel in the region. We do that in software
currently for all pipelines.
For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs
CPU path
rasterization.
For the SW pipeline, drawLine is a simplified case of
drawPolygon and so the overhead of lots of calls to drawLine()
dominates its performance.
I would expect ES2 to blow the SW pipeline out of the water
with drawLine() performance (as long as there are no additional
rendering primitives interspersed in the set of lines).
But, both should be on the same footing for the drawPolygon
case. Does
the ES2 pipeline compare similarly (hopefully better than) the
SW
pipeline for the polygon case?
One thing I noticed is that we have no optimized case for
drawLine() on the SW pipeline. It generates a path containing a
single MOVETO and LINETO and feeds it to the generalized path
rasterizer when it could instead compute the rounded/square
rectangle and render it more directly. If we added that support
then I'd expect the SW pipeline to perform the set of drawLine
calls faster than drawPolygon as well...
...jim
On 3/28/15 3:22 AM, Chris Newland wrote:
Hi Robert,
I've not filed a Jira yet as I was hoping to find time to
investigate thoroughly but when I saw your question I thought
I'd
better add my findings.
I believe the issue is in the ES2Pipeline as if I run with
-Dprism.order=sw then strokePolygon outperforms the series of
strokeLine commands as expected:
java -cp target/DemoFX.jar -Dprism.order=sw
com.chrisnewland.demofx.DemoFXApplication -c 500 -m line
Result:
44fps
java -cp target/DemoFX.jar -Dprism.order=sw
com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly
Result:
60fps
Will see if I can find the root cause as I've got plenty more
examples where ES2Pipeline performs horribly on my Mac which
should have no problem throwing around a few thousand polys.
I realise there's a *lot* of indirection involved in making
JavaFX
support such a wide range of underlying graphics systems but I
do think there's a bug here.
Will file a Jira if I can contribute a bit more than "feels
slow" ;)
Cheers,
Chris
On Sat, March 28, 2015 10:06, Robert Krüger wrote:
This is consistent with what I am observing. Is this
something that Oracle is aware of? Looking at Jira, I don't
see that anyone is working on this:
https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Op
en%2C% 20%2
2In%
20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%
20%20A
ND%2
0la
bels%20in%20(performance)
Given that one of the One of the main reasons to use JFX
for me is to be able to develop with one code base for at
least OSX and Windows and
the official statement what JavaFX is for, i.e.
"JavaFX is a set of graphics and media packages that
enables developers to design, create, test, debug, and
deploy rich client applications that operate consistently
across diverse platforms"
and the fact that this is clearly not the case currently
(8u40)
as soon as I do something else than simple forms, I run into
performance/quality problems on the Mac, I am a bit unsure
what to make of all that. Is Mac OSX a second-class citizen
as far as dev resources are concerned?
Tobi and Chris, have you filed Jira Issues on Mac graphics
performance that can be tracked?
I will file an issue with a simple test case and hope for
the best.
On Fri, Mar 27, 2015 at 11:08 PM, Chris Newland
<cnewl...@chrisnewland.com>
wrote:
Possibly related:
I can reproduce a massive (90%) performance drop on OSX
between drawing a wireframe polygon on a Canvas using a
series of gc.strokeLine(double x1, double y1, double x2,
double y2) commands versus using a single
gc.strokePolygon(double[] xPoints, double[] yPoints, int
count) command.
Creating the polygons manually with strokeLine() is
significantly faster using the ES2Pipeline on OSX.
This is reproducible in a little GitHub JavaFX
benchmarking project I've created:
https://github.com/chriswhocodes/DemoFX
Build with ant
Run with:
# use strokeLine
./run.sh -c 5000 -m line
result: 60 (sixty) fps
# use strokePolygon
./run.sh -c 5000 -m poly
result: 6 (six) fps
System is 2011 iMac 27" / Mavericks / 3.4GHz Core i7 /
20GB RAM
/
Radeon
6970M 1024MB
Looking at the code paths in
javafx.scene.canvas.GraphicsContext:
gc.strokeLine() maps to writeOp4(x1, y1, x2, y2,
NGCanvas.STROKE_LINE)
gc.strokePolygon() maps to writePoly(xPoints, yPoints,
nPoints, true, NGCanvas.STROKE_PATH) which involves
significantly more work with adding to and flushing a
GrowableDataBuffer.
I've not had time to dig any deeper than this but it's
surely a bug when building a poly manually is 10x faster
than using the convenience method.
Cheers,
Chris
On Fri, March 27, 2015 21:26, Tobias Bley wrote:
In my opinion the whole graphics performance on MacOSX
isn’t good at all with JavaFX….
Am 27.03.2015 um 22:10 schrieb Robert Krüger
<krue...@lesspain.de>:
The bad full screen performance is without the arcs.
It is
just one call to fillRect, two to strokeOval and one
to fillOval, that's all. I will build a simple test
case and file an issue.
On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham
<james.gra...@oracle.com>
wrote:
Hi Robert,
Please file a Jira issue with a simple test case.
Arcs
are handled as a generalized shape rather than via a
predetermined shader, but it shouldn't be that
slow. Something else may
be going on.
Another test might be to replace the arcs with
rectangles or ellipses and see if the performance
changes...
...jim
On 3/27/15 1:52 PM, Robert Krüger wrote:
Hi,
I have a super-simple animation implemented using
AnimationTimer
and Canvas where the canvas just performs a few
draw operations, i.e. fills the screen with a
color and then draws and fills 2-3 circles and I
have already observed that each drawing operation
I add, results in
significant CPU load (e.g. when I draw < 10 arcs
in addition to the circles, the CPU load goes up
to 30-40% on a Mac Book Pro for a Canvas size of
600x600(!).
Now I tested the animation in full screen mode
(only
with a few circles) and playback is unusable for a
serious application (very choppy). Is 2D canvas
performance known to be very bad on Mac or am I
doing something wrong? Are there workarounds for
this?
Thanks,
Robert
--
Robert Krüger
Managing Partner
Lesspain GmbH & Co. KG
www.lesspain-software.com
--
Robert Krüger
Managing Partner
Lesspain GmbH & Co. KG
www.lesspain-software.com