Re: Scene graph performance

Felix Bembrick Thu, 21 Jul 2016 11:33:12 -0700

Hi Richard,

Wow! Thanks - you really know your stuff!


Yes, it's not a "one LOC change" and I get that it has a lot to do with the 
difficult marriage of 2D and 3D worlds.

And it does seem like a large project in itself to solve these problems but, I 
really believe it *has* to be done and I intend to at least do whatever I can 
to achieve at least a linear relationship between CPU/GPU cores/grunt and 
JavaFX performance.

At the moment, it doesn't seem to matter *what* hardware you throw it at, the 
JavaFX scene graph performance is almost static.

The issues you have highlighted here will be very useful indeed in making this 
happen.

I think I might have mentioned to you privately that the Qt rendering pipeline 
had similar problems but has been greatly optimised in the last couple of 
releases.

I paid very close attention to the issues and how they resolved them and I'm 
sure many of those techniques could be applied in this scenario.

Anyway - it's certainly worth a try!

Felix

> On 22 Jul 2016, at 02:41, Richard Bair <richard.b...@oracle.com> wrote:
> 
> Have you guys profiled the application to see where the CPU time is spent? 
> How many nodes in the app?
> 
> In the past the majority of CPU time has been spent in one or more of the 
> following (not sure if it still applies):
>  - Computing changed bounds (a lot of work was done to speed this up, but I 
> think it was always a single thread doing the work)
>  - Synchronizing state to the render graph (a lot of work was done here too, 
> so I wouldn’t expect this to be the problem area)
>  - Walking the render graph to render shapes
>  - Rasterization (A lot of optimization went here too, but it is still 
> essentially a CPU bound operation)
>  - Reading results back from the card (this is sometimes the culprit when it 
> is slow on old hardware and fast on new hardware)
> 
> These are all CPU bound tasks.
> 
> I think that there are two angles to look at. First, we always wanted to 
> break down the render stage into some parallel pipeline. Adobe Flex was good 
> at this, where they’d saturate every CPU you had during the CPU intensive 
> rasterization phase and scene graph computation phase. Depending on your 
> particular test, this might actually be the bottleneck. So the idea here is 
> to optimize the CPU tasks, which will (hopefully) remove CPU from the 
> bottleneck and allow the GPU to take on more of the burden. You should also 
> do some research or experiments with regards to battery life to make sure 
> using more cores doesn’t make things worse (and if it does, to then have a 
> flag to indicate the amount of parallelism). You also have to be careful 
> because IIRC (and I may not!) if a lot of CPU activity happens on some 
> laptops they’ll kick into a “high performance” mode, which is sometimes what 
> you want, and sometimes not. Games are happy to kick you into that mode (and 
> drain your battery faster as a result) whereas business apps rarely want to 
> do that.
> 
> Another angle to look at is more of a research direction in the graphics 
> rendering. We spent quite a lot of time looking into ways to go “shader all 
> the way” and avoid having to use a software rasterizer at all. The state of 
> the art as likely advanced from the last time I looked at it, but at the time 
> there really wasn’t anything that we could find that was really ready for 
> production in terms of producing 2D screens using 3D that really gave you the 
> polish of 2D. Also, the scene graph semantics are fundamentally painter’s 
> algorithm, since this is what everybody is used to when coming from a 2D 
> background. But that’s not the way it works in 3D. In 3D you feed a bunch of 
> meshes to the video card and it determines which pixels to render and which 
> are occluded and don’t need to be rendered. But when you have multiple 
> geometries at the same z-depth, then the card can have “z-fighting” where it 
> renders the pixels from some items below you and some above. There are 
> techniques to try to overcome this, but at least the last time we looked at 
> it (according to my increasingly dimming memory!) there wasn’t a really 
> brilliant solution to the problem. Anti-aliasing and transparency were big 
> problems too.
> 
>>>>> DETOUR
> 
> Normal things that you would have in 2D like shadows, text, even rounded 
> rectangles have historically been produced using various 2D algorithms and 
> blend modes and so forth. Most people don’t even realize the degree to which 
> their view of what a real 2D screen looks like has been tainted by the 
> techniques that were available for producing those screens. Many (most? all? 
> at least it was at the time) game developers recognized this and used 2D 
> toolkits with CPU rasterization to produce their 2D screens and then overlaid 
> this on 3D content. The normal way to do this is to render the 2D images in 
> photoshop or something and then slice it up and load the pngs into the 
> graphics card on app startup and then scale those to produce the images. This 
> is fine, but in a general purpose toolkit like FX you can’t just do that, 
> because we allow programmatic access to the scene graph and people can modify 
> the “images” in real time. So we draw them and cache them and reuse the 
> cached images whenever possible etc. A lot was done in order to try to 
> optimize this part of the problem.
> 
> When I was benchmarking this stuff, we blew away pretty much everybody who 
> was in the 2D+3D general purpose graphics toolkit world. We never tried to 
> compete with the game vendors (like Unity). We weren’t trying to be a pure 3D 
> scene graph. There was a huge discussion about this early on in FX, as to how 
> to marry the 2D and 3D worlds. Developers in these different lands come at 
> the problem differently, in terms of how they understand their world (y-up or 
> y-down? 0,0 in the middle? Every scene scaled? Or 0,0 in top left and pixel 
> scaled by default? Anti-aliasing?). We decided it was for 2D developers who 
> wanted advanced graphics and animations, and a better toolkit for building 
> apps (not games). We figured that for people who wanted to program games, we 
> were never going to be really compelling without building out a lot of 
> additional support, way beyond just graphics performance. Looking at Unity 
> you can see where we’d have had to go to be a compelling game platform, and 
> obviously Sun and Oracle are not in that business.
> 
> <<<<< END DETOUR
> 
> One of the projects I really wanted to do was to modify Prism to take 
> advantage of multiple cores in the computation / rasterization steps. I think 
> doing so would be a pretty major job and would have to be done quite 
> carefully. My guess is that this would help with the problem you are seeing, 
> but I couldn’t be 100% sure without digging into the details of the benchmark 
> and profile.
> 
> Richard
> 
>> On Jul 21, 2016, at 4:04 AM, Felix Bembrick <felix.bembr...@gmail.com> wrote:
>> 
>> I would add that neither JOGL nor LWJGL have these issues.
>> 
>> Yes, I know they are somewhat different "animals", but the point is, clearly 
>> *Java* is NOT the cause.
>> 
>>> On 21 Jul 2016, at 20:07, Dr. Michael Paus <m...@jugs.org> wrote:
>>> 
>>> Hi Felix,
>>> I have written various tests like the ones you use in FXMark and I have
>>> obtained similar results. I have even tried to substitute 2D shapes by
>>> using 3D MeshViews in the hope that this would give better performance
>>> but the results were not that good. Of course all this depends on the
>>> specific test case but in general I see that a JavaFX application which
>>> makes heavy use of graphics animations is completely CPU-bounded.
>>> The maximum performance is reached when one CPU/Core is at 100%.
>>> The performance of your graphics hardware seems to be almost irrelevant.
>>> I could for example run four instances of the same test with almost the
>>> same performance at the same time. In this case all 4 cores of my machine
>>> were at 100%. This proves that the graphics hardware is not the limiting
>>> factor. My machine is a MacBook Pro with Retina graphics and a dedicated
>>> NVidia graphics card which is already a couple of years old and certainly
>>> not playing in the same league as your high-power card.
>>> I myself have not yet found a way to really speed up the graphics 
>>> performance
>>> and I am a little bit frustrated because of that. But it is not only the 
>>> general
>>> graphic performance which is a problem. There are also a lot of other 
>>> pitfalls
>>> into which you can stumble and which can bring your animations to a halt
>>> or even crash your system. Zooming for example is one of these issues.
>>> 
>>> I would like to have some exchange on these issues and how to best address
>>> them but my impression so far is that there are only very view people 
>>> interested
>>> in that. (I hope someone can prove me wrong on this :-)
>>> 
>>> Michael
>>> 
>>>> Am 20.07.16 um 04:14 schrieb Felix Bembrick:
>>>> Having written and tested FXMark on various platforms and devices, one
>>>> thing has really struck me as quite "odd".
>>>> 
>>>> I started work on FXMark as a kind of side project a while ago and, at the
>>>> time, my machine was powerful but not "super powerful".
>>>> 
>>>> So when I purchased a new machine with just about the highest specs
>>>> available including 2 x Xeon CPUs and (especially) 4 x NVIDIA GTX Titan X
>>>> GPUs in SLI mode, I was naturally expecting to see significant performance
>>>> improvements when I ran FXMark on this machine.
>>>> 
>>>> But to my surprise, and disappointment, the scene graph animations ran
>>>> almost NO faster whatsoever!
>>>> 
>>>> So then I decided to try FXMark on my wife's entry-level Dell i5 PC with a
>>>> rudimentary (single) GPU and, guess what - almost the same level of
>>>> performance (i.e. FPS and smoothness etc.) was achieved on this
>>>> considerably less powerful machine (in terms of both CPU and GPU).
>>>> 
>>>> So, it seems there is some kind of "performance wall" that limits the
>>>> rendering speed of the scene graph (and this is with full speed animations
>>>> enabled).
>>>> 
>>>> What is the nature of this "wall"? Is it simply that the rendering pipeline
>>>> is not making efficient use of the GPU? Is too much being done on the CPU?
>>>> 
>>>> Whatever the cause, I really think it needs to be addressed.
>>>> 
>>>> If I can't get better performance out of a machine that scores in the top
>>>> 0.01% of all machine in the world in the 3DMark Index than an entry level
>>>> PC, isn't this a MAJOR issue for JavaFX?
>>>> 
>>>> Blessings,
>>>> 
>>>> Felix
>

Re: Scene graph performance

Reply via email to