Hi Richard, Wow! Thanks - you really know your stuff!
Yes, it's not a "one LOC change" and I get that it has a lot to do with the difficult marriage of 2D and 3D worlds. And it does seem like a large project in itself to solve these problems but, I really believe it *has* to be done and I intend to at least do whatever I can to achieve at least a linear relationship between CPU/GPU cores/grunt and JavaFX performance. At the moment, it doesn't seem to matter *what* hardware you throw it at, the JavaFX scene graph performance is almost static. The issues you have highlighted here will be very useful indeed in making this happen. I think I might have mentioned to you privately that the Qt rendering pipeline had similar problems but has been greatly optimised in the last couple of releases. I paid very close attention to the issues and how they resolved them and I'm sure many of those techniques could be applied in this scenario. Anyway - it's certainly worth a try! Felix > On 22 Jul 2016, at 02:41, Richard Bair <richard.b...@oracle.com> wrote: > > Have you guys profiled the application to see where the CPU time is spent? > How many nodes in the app? > > In the past the majority of CPU time has been spent in one or more of the > following (not sure if it still applies): > - Computing changed bounds (a lot of work was done to speed this up, but I > think it was always a single thread doing the work) > - Synchronizing state to the render graph (a lot of work was done here too, > so I wouldn’t expect this to be the problem area) > - Walking the render graph to render shapes > - Rasterization (A lot of optimization went here too, but it is still > essentially a CPU bound operation) > - Reading results back from the card (this is sometimes the culprit when it > is slow on old hardware and fast on new hardware) > > These are all CPU bound tasks. > > I think that there are two angles to look at. First, we always wanted to > break down the render stage into some parallel pipeline. Adobe Flex was good > at this, where they’d saturate every CPU you had during the CPU intensive > rasterization phase and scene graph computation phase. Depending on your > particular test, this might actually be the bottleneck. So the idea here is > to optimize the CPU tasks, which will (hopefully) remove CPU from the > bottleneck and allow the GPU to take on more of the burden. You should also > do some research or experiments with regards to battery life to make sure > using more cores doesn’t make things worse (and if it does, to then have a > flag to indicate the amount of parallelism). You also have to be careful > because IIRC (and I may not!) if a lot of CPU activity happens on some > laptops they’ll kick into a “high performance” mode, which is sometimes what > you want, and sometimes not. Games are happy to kick you into that mode (and > drain your battery faster as a result) whereas business apps rarely want to > do that. > > Another angle to look at is more of a research direction in the graphics > rendering. We spent quite a lot of time looking into ways to go “shader all > the way” and avoid having to use a software rasterizer at all. The state of > the art as likely advanced from the last time I looked at it, but at the time > there really wasn’t anything that we could find that was really ready for > production in terms of producing 2D screens using 3D that really gave you the > polish of 2D. Also, the scene graph semantics are fundamentally painter’s > algorithm, since this is what everybody is used to when coming from a 2D > background. But that’s not the way it works in 3D. In 3D you feed a bunch of > meshes to the video card and it determines which pixels to render and which > are occluded and don’t need to be rendered. But when you have multiple > geometries at the same z-depth, then the card can have “z-fighting” where it > renders the pixels from some items below you and some above. There are > techniques to try to overcome this, but at least the last time we looked at > it (according to my increasingly dimming memory!) there wasn’t a really > brilliant solution to the problem. Anti-aliasing and transparency were big > problems too. > >>>>> DETOUR > > Normal things that you would have in 2D like shadows, text, even rounded > rectangles have historically been produced using various 2D algorithms and > blend modes and so forth. Most people don’t even realize the degree to which > their view of what a real 2D screen looks like has been tainted by the > techniques that were available for producing those screens. Many (most? all? > at least it was at the time) game developers recognized this and used 2D > toolkits with CPU rasterization to produce their 2D screens and then overlaid > this on 3D content. The normal way to do this is to render the 2D images in > photoshop or something and then slice it up and load the pngs into the > graphics card on app startup and then scale those to produce the images. This > is fine, but in a general purpose toolkit like FX you can’t just do that, > because we allow programmatic access to the scene graph and people can modify > the “images” in real time. So we draw them and cache them and reuse the > cached images whenever possible etc. A lot was done in order to try to > optimize this part of the problem. > > When I was benchmarking this stuff, we blew away pretty much everybody who > was in the 2D+3D general purpose graphics toolkit world. We never tried to > compete with the game vendors (like Unity). We weren’t trying to be a pure 3D > scene graph. There was a huge discussion about this early on in FX, as to how > to marry the 2D and 3D worlds. Developers in these different lands come at > the problem differently, in terms of how they understand their world (y-up or > y-down? 0,0 in the middle? Every scene scaled? Or 0,0 in top left and pixel > scaled by default? Anti-aliasing?). We decided it was for 2D developers who > wanted advanced graphics and animations, and a better toolkit for building > apps (not games). We figured that for people who wanted to program games, we > were never going to be really compelling without building out a lot of > additional support, way beyond just graphics performance. Looking at Unity > you can see where we’d have had to go to be a compelling game platform, and > obviously Sun and Oracle are not in that business. > > <<<<< END DETOUR > > One of the projects I really wanted to do was to modify Prism to take > advantage of multiple cores in the computation / rasterization steps. I think > doing so would be a pretty major job and would have to be done quite > carefully. My guess is that this would help with the problem you are seeing, > but I couldn’t be 100% sure without digging into the details of the benchmark > and profile. > > Richard > >> On Jul 21, 2016, at 4:04 AM, Felix Bembrick <felix.bembr...@gmail.com> wrote: >> >> I would add that neither JOGL nor LWJGL have these issues. >> >> Yes, I know they are somewhat different "animals", but the point is, clearly >> *Java* is NOT the cause. >> >>> On 21 Jul 2016, at 20:07, Dr. Michael Paus <m...@jugs.org> wrote: >>> >>> Hi Felix, >>> I have written various tests like the ones you use in FXMark and I have >>> obtained similar results. I have even tried to substitute 2D shapes by >>> using 3D MeshViews in the hope that this would give better performance >>> but the results were not that good. Of course all this depends on the >>> specific test case but in general I see that a JavaFX application which >>> makes heavy use of graphics animations is completely CPU-bounded. >>> The maximum performance is reached when one CPU/Core is at 100%. >>> The performance of your graphics hardware seems to be almost irrelevant. >>> I could for example run four instances of the same test with almost the >>> same performance at the same time. In this case all 4 cores of my machine >>> were at 100%. This proves that the graphics hardware is not the limiting >>> factor. My machine is a MacBook Pro with Retina graphics and a dedicated >>> NVidia graphics card which is already a couple of years old and certainly >>> not playing in the same league as your high-power card. >>> I myself have not yet found a way to really speed up the graphics >>> performance >>> and I am a little bit frustrated because of that. But it is not only the >>> general >>> graphic performance which is a problem. There are also a lot of other >>> pitfalls >>> into which you can stumble and which can bring your animations to a halt >>> or even crash your system. Zooming for example is one of these issues. >>> >>> I would like to have some exchange on these issues and how to best address >>> them but my impression so far is that there are only very view people >>> interested >>> in that. (I hope someone can prove me wrong on this :-) >>> >>> Michael >>> >>>> Am 20.07.16 um 04:14 schrieb Felix Bembrick: >>>> Having written and tested FXMark on various platforms and devices, one >>>> thing has really struck me as quite "odd". >>>> >>>> I started work on FXMark as a kind of side project a while ago and, at the >>>> time, my machine was powerful but not "super powerful". >>>> >>>> So when I purchased a new machine with just about the highest specs >>>> available including 2 x Xeon CPUs and (especially) 4 x NVIDIA GTX Titan X >>>> GPUs in SLI mode, I was naturally expecting to see significant performance >>>> improvements when I ran FXMark on this machine. >>>> >>>> But to my surprise, and disappointment, the scene graph animations ran >>>> almost NO faster whatsoever! >>>> >>>> So then I decided to try FXMark on my wife's entry-level Dell i5 PC with a >>>> rudimentary (single) GPU and, guess what - almost the same level of >>>> performance (i.e. FPS and smoothness etc.) was achieved on this >>>> considerably less powerful machine (in terms of both CPU and GPU). >>>> >>>> So, it seems there is some kind of "performance wall" that limits the >>>> rendering speed of the scene graph (and this is with full speed animations >>>> enabled). >>>> >>>> What is the nature of this "wall"? Is it simply that the rendering pipeline >>>> is not making efficient use of the GPU? Is too much being done on the CPU? >>>> >>>> Whatever the cause, I really think it needs to be addressed. >>>> >>>> If I can't get better performance out of a machine that scores in the top >>>> 0.01% of all machine in the world in the 3DMark Index than an entry level >>>> PC, isn't this a MAJOR issue for JavaFX? >>>> >>>> Blessings, >>>> >>>> Felix >