Here is the breakdown of performance issues that I have. The ones I think will 
lead to decent wins are starred, and Super Shader triple-star'd. This list was 
pulled from the JIRA filter I previously sent. The point of this post is to 
give everybody an easy-to-see list of performance related issues (as of a month 
ago or so). Some of these might now be done, this isn't meant to be 
comprehensive (although at the time I did this I did visit each and every issue 
labeled "performance" so it was pretty comprehensive!).

Interested in helping out? I'll be glad to give background on any one of these 
issues and pointers as to how to go about working on any of them.

Richard

Architecture
        • *RT-9363*: Consider reducing conversions between 'FX' API and scene 
graph API
        • *RT-24582*: High frequency refresh and Heavy but low priority updates 
in the same app (multithreaded render, multi instance…)
        • *RT-26492*: Use GCC link time optimization to reduce binary size
        • *RT-26531*: Provide independent stage performance

        • RT-15083: Replace boolean fields with bit fields
        • RT-20397: Remove PGNodes
        • RT-23470: Replace java.lang.Math usage in places where precision is 
not as important
        • RT-23741: Add a hint to let scene graph and Prism know that we are 
animating
        • RT-23866: Optimize Raspberry PI build for armv6/VFP
        • RT-23867: Mac Glass uses gcc -O3 which is known to produce code with 
large static footprint
        • RT-23868: Glass: Consider collapsing Event classes into a single one.
        • RT-24238: Analyze property getters
        • RT-29861: Consider replacing Math functions with a faster alternative
        • RT-29900: Increased CPU usage on application iconified

Decora
        • RT-2892: Improve performance of Gaussian-based effects
        • RT-2908: Use scaled kernel to improve DropShadow performance for node 
scale factors < 1
        • RT-5347: Prism: finish drop/inner shadow optimizations
        • RT-5420: DropShadow effects significantly affect performance
        • RT-6935: ColorAdjust effect consumes a lot of memory which could lead 
to OOM exception
        • RT-8890: Merge and some Blend effects should optimize rendering to 
destination
        • RT-9225, RT-9226, RT-9227: Various effects don't limit the size of 
the input image when requests are outside the clip
        • RT-9432: Some of the hand-tuned software effect peers are not 
optimized for use with transformed inputs
        • RT-9433: The auto-generated software peers for the effects filters do 
not handle transformed inputs optimally
        • RT-9434: Reflection effect does not clip its output image to the 
requested clip bounds
        • RT-9437: Prism and Hardware Swing pipelines could perform 
PerspectiveTransform directly
        • RT-13714: Implement ColorAdjust as a matrix multiplication

Text
        • *RT-23467*: Evaluate Native Text Engines
        • *RT-23578*: Consider pre-populating the glyph cache with data for the 
default font at the default size(s)
        • *RT-23705*: Reduce the amount of glyph data copied via Java from 
native to see if it helps performance
        • *RT-23708*: Investigate if a segmented glyph cache can help 
performance
        • *RT-30158*: Investigate String Measurement in FX (cache results, call 
less, …)

        • RT-5069: Text node computes complete text layout, even if clipped to 
a much smaller size
        • RT-6475: Need new hints to control how Text node is rendered
        • RT-21269: Font#loadFont(String,double) downloads file in the main 
thread
        • RT-23579: Consider using a fixed interval for glyph cache for faster 
computation
        • RT-23580: Add a variant of text smoothing to deal with rotated text 
at higher versus lower quality
        • RT-24329: LCD font smoothing performance
        • RT-24565: Beagle: Complex Text implementation generates big swing in 
frame rate
        • RT-24941: 8.0-graphics-scrum-h90: GlyphCache.render() takes up to 
200ms which results in jerky rendering
        • RT-26111: Use glyph bounding boxes to get visual bounds
        • RT-26894: String rendering is less performant than java2D one

Scene Graph
        • *RT-23346*: Provide API access to multiple hardware screen layers

        • RT-5477: Improve performance and reduce garbage when animating 
gradients
        • RT-5525: Group will get bounds change notification when child's 
bounds change, even if change in child didn't alter bounds of Group
        • RT-9390: Improve picking performance using Dmitri's algorithm (or 
other)
        • RT-9571: Consider adding image caching for images loaded from remote 
URLs
        • RT-10604: Recomputing bounds when effects are used even if not dirty
        • RT-10681: Reevaluate only changed KeyFrames
        • RT-12105: Fix for RT-11562 disables an optimization for calculating 
content bounds
        • RT-12136: SortedList possible optimizations
        • RT-12137: FilteredList possible optimizations
        • RT-12564: Layout spends considerable time in getManagedChildren
        • RT-12715: Node.toBack()/toFront() are inefficient
        • RT-13593: Performance of PathTransition sucks
        • RT-19221: Padding for round cap could be optimized in Line
        • RT-19222: Optimize impl_configShape of Path
        • RT-20455: Do not always recreate the whole geometry in calls to 
impl_configShape
        • RT-23312: OutOfMemoryError after pressing Ctrl+Alt+Del or minimizing 
the window whilst animating a canvas
        • RT-24587: Changing a single child of FlowLayout is slower than 
changing all children
        • RT-26007: Mouse event post-processing does unnecessary work, may be 
incorrect altogether
        • RT-29717: Do not wrap notifications in ObservableList wrappers when 
no listeners are set

Prism
        • *RT-15118*: Need to consider architectural changes for doing 
transforms in prism
        • *RT-15839*: Complex animated content in a ScrollPane is jerky 
although little is seen
        • *RT-17396*: Shader based 2D path rendering
        • *RT-17582*: Render the scene using retained data structures
        • *RT-20356*: PresentingPainter and UploadingPainter disregarding dirty 
clip rect
        • *RT-20405*: Improve Path rendering performance
        • *RT-23371*: FB: Render windows on separate hardware layers
        • *RT-23450*: Improve performance of Prism rendering and clipping
        • *RT-23462*: Create "CommandBuffer" for storing graphics drawing 
commands in Prism
        • *RT-24168*: View.uploadPixels could take a source rectangle to upload 
only a portion of the pixels
        • *RT-30271*: No culling if the only dirty region contains the clip
        • *RT-30361*: Consider rendering directly to frame buffer instead of RTT
        • *RT-30440*: Eliminate redundant OpenGL calls
        • ***RT-30741***: Super Shader
        • *RT-30746*: don't fill transparent rectangles, cache a more textures 
to avoid buffer flush
        • *RT-30748*: Use Vertex Shader to provide clipping instead of Scissor 
test

        • RT-5835: Fix for RT-5788 disabled an optimization for anti-aliased 
rectangles
        • RT-6968: Prism should support 2-byte gray-alpha .png format
        • RT-8722: Strokes and fills of Paths slower than flash
        • RT-9682: Optimize shadow effects for rounded rectangles
        • RT-10369: Optimize blurs in shaders
        • RT-12400: Delays in D3D Present affect performance
        • RT-14058: Consider possibility to eliminate using of 
BasicStroke.tmpMiter
        • RT-14216: MultipleArrayGradient uses a lot of memory
        • RT-14358: Insertion sort in OpenPisces ScanlineIterator may be very 
inefficient
        • RT-14421: Branch YCbCr shader may reduce performance on slower 
hardware
        • RT-15516: image data associated with cached nodes that are removed 
from a scene are not aggressively released
        • RT-17507: Optimize non-uniform round rect rendering in Regions
        • RT-17510: Improve performance of rendering a TRANSPARENT stage on 
Windows 7
        • RT-17551: MacOS: Optimize using lockFocusIfCanDraw
        • RT-18060: Evaluate whether enabling multithreaded GL engine on Mac 
benefits Mac JFX performance
        • RT-18140: Consider using nearest-neighbor when smooth=false for SW 
pipeline to improve performance
        • RT-18417: Investigate Mac runtime code for possible native code 
optimizations using GDC (Grand Dispatch Central)
        • RT-19556: Consider removing usage of DirectByteBuffer and 
ByteBuffer.allocateDirect
        • RT-19576: Pixel readback performance for the ES2 pipeline has room 
for improvement
        • RT-21025: iOS: DirtyAreaTest on iOS is slower than we like
        • RT-22430: Use 'fillQuad' vs. 'fillRect' for pixel aligned rectangular 
regions
        • RT-22431: Optimize Charts drawing to use filled quads
        • RT-23464: Reduce Vertex Buffer Overhead: Constant Color Attribute vs. 
Array Color Attributes
        • RT-23465: Using TriangleStrip instead of Triangles
        • RT-23466: Improve Vertex Buffer Usage: Structure of Arrays vs. Array 
of Structures
        • RT-23471: Add new Etched effect
        • RT-23574: Add support for tiled rendering of textures (both for 
performance and functional reasons)
        • RT-23575: Need a more compact representation for text data
        • RT-23576: Ability to add hand-coded shaders (bypassing JSL)
        • RT-23577: Support for geometry shaders on graphics chips that support 
it
        • RT-23581: Add ability to render 9-slice directly in Prism graphics
        • RT-23725: Beagleboard: Execute fragment shader on the GPU causes 
significant drop in performance
        • RT-23742: Gradient is slow on embedded systems
        • RT-24104: Native Pisces rasterizer is slower on desktop Linux 
platforms
        • RT-24339: Add a short-cut to dirty region code based on parent / 
child bounds ratio
        • RT-24557: ImagePattern is slow on embedded systems
        • RT-24624: prism-sw pipeline is up to 90% worse than j2d pipeline
        • RT-25166: Path updates in a ScrollPane where content has a Scale 
transform are 100 times slower
        • RT-25603: Mac optimization: Investigate layers async vs sync setting
        • RT-25694: Rewrite (AA)TessShapeRep classes in order to avoid 
unnecessary translations
        • RT-25864: New "shared textures" do not share pixel update flags as 
well as they should
        • RT-26531: Provide independent stage performance
        • RT-28222: Don't render transparent rectangles
        • RT-28305: NGRegion optimizations based on Color.TRANSPARENT are 
ineffective
        • RT-28670: Create a roundrect renderer that uses the new "texture 
primitive" based shaders used currently for ellipses and rects
        • RT-28752: Mac: 8.0-graphics-scrum-792: up to 30% performance 
regression on MacOS
        • RT-29542: FX 8 3D: Mesh computation code needs major clean up or redo
        • RT-30360: Create fewer temporary objects in Quantum
        • RT-30589: preprocess remove comments from ES2 3D shaders
        • RT-30710: 8.0-graphics-scrum-1194: 20% performance regression in 
Bitmap benchmarks in SW pipeline
        • RT-30745: Remove Flush & Finish in ES2SwapChain
        • RT-30747: Introduce a low cost clipping API for simple rectangle 
based clipping

Media
        • RT-11379: video playing with MediaPlayer slows down refreshes to 
Java2D component
        • RT-16420: MediaPlayer/View loses frames from video streams encoded at 
25,30,60 fps
        • RT-17861: Use shaders to assist video decoding on the GPU
        • RT-20890: Too many open files and Memory leak

Web
        • RT-24320: WebView draws entire back buffer on screen upon every 
repaint
        • RT-24998: Please enable Javascript JIT for 64 bit

        • RT-16848: Optimize Unicode implementation
        • RT-18909: Extend support for composite operations in Prism Graphics
        • RT-19625: Better support for Webnode to improve rendering performance
        • RT-20501: Prism needs to provide proper APIs to support the Webnode 
team to improve webnode rendering performance
        • RT-21629: Slow and never-ending rendering of page
        • RT-21722: html5 video inside is slow
        • RT-22008: Zero size WCGraphicsPrismContext.Layer handling is not 
perfectly efficient
        • RT-30083: netflix.com: vertical scrollbar is tremendously slow

Threading
        • *RT-2893*: Enable multi-threaded processing of software-based effects 
when >= 2 cores available
        • *RT-26702*: Poor DisplacementMap effect performance on Mac

Interop
        • RT-22133: Performance: JavaFX Webview 
QuantumRenderer$PipelineRunnable.run() and WinApplication._runLoop() take up 
more than half the time in a JDeveloper operation
        • RT-22567: Minor tweaks to FX/Swing painting
        • RT-22705: Simple animation runs at lower FPS when embedded into 
JFXPanel
        • RT-24278: JFXPanel with simple animation consumes entire CPU core
        • RT-26993: Noticeable jerkiness when running JFXPanelBitmapBenchmark 
on MacOS

Benchmarks
        • RT-7644: Math.floor and Math.ceil take up a lot of cpu time

Controls
        • *RT-24105*: TabPane renders content of all tabs even only one is 
active
        • *RT-30452*: Setting clip on TableCellSkinBase is incorrect
        • *RT-30552*: Label: resolve LabelSkinBase's use of clips for text
        • *RT-30568*: Reduce unnecessary calls to setManaged(true) in Controls
        • *RT-30576*: Parent: add new public layout method, optimized to only 
layout this parent and it's children
        • *RT-30648*: Investigate API for TabPane's Tab Content Loading policy

        • RT-9094: VirtualFlow requests data from model too frequently
        • RT-10034: Performance optimizations around SelectionModel 
implementations
        • RT-13792: Investigate caching in controls (NOTE: Unlikely to be any 
win)
        • RT-16529: Memory Leak: event handlers of root TreeItem are not removed
        • RT-16853: TextArea: performance issue
        • RT-18934: TextArea.appendText/deleteText may be very slow
        • RT-20101: [ComboBox] Custom string converter is applied too many times
        • RT-23825: Controls need a lifecycle API
        • RT-24102: CSS Loading: Split caspian.css into multiple smaller 
component parts.
        • RT-25652: Memory Leak in TabPane
        • RT-25801: 8.0-controls-scrum-h81: 25% performance regression in 
Controls.RadioButton on mac-low end machine
        • RT-26716: Performance of scrolling TreeView tail is much more slowly 
when scrolling TreeView head
        • RT-26999: 8.0-controls-scrum-h122: up to 20% regression in some 
Controls.TableView benchmarks
        • RT-27725: 8.0-controls-scrum-h186: 22% footprint increase in 
ChoiceBox control
        • RT-27986: Spinning progress indicator overlapping an image plays 
havoc with RDP
        • RT-29055: java.lang.OutOfMemoryError: Java heap space error in 
switching between caspian to modena theme in Modena App
        • RT-30305: 8.0-controls-scrum-569: 42% performance regression in 
Controls.ListView-Keyboard
        • RT-30713: VirtualFlow creates new cells in some instances
        • RT-30824: TableView TableCell memory issue in javaFX 8.x

Embedded
        • *RT-30721*: Provide flag to turn on PRESERVED mode in EGL
        • *RT-30722*: Provide an option for 16-bit opaque frame buffer on the 
Raspberry PI
        • *RT-30723*: EGL: Disable clipping when clearing frame buffer

        • RT-24685: Virtual keyboard initialization is slow
        • RT-24937: Use a C/C++ compiler that can take advantage of NEON
        • RT-25943: Need to consider specific OpenGL extension on embedded 
system
        • RT-25995: Prism porting layer function to query platform VRAM
        • RT-27590: Evaluate effect of ProGuard on runtime size
        • RT-28012: EGLFB: RAM allocation should be reduced
        • RT-28029: Improve EGLFB dialog / popup response time
        • RT-30719: Enabled video underlays on Raspberry PI

CSS
        • *RT-28966*: CSS creates new objects for complex values which trigger 
redundant processing including rendering
        • *RT-30381*: fx8.0-b86: CSS code for modena css rules with multiple 
selectors is not optimized

        • RT-11506: Short circuit CSS if CSS is not relevant to the Node
        • RT-11881: Some css selectors in caspian.css will turn the CSS 
processing on for all the parents
        • RT-11882: Under current conditions, every Node is processing CSS
        • RT-23468: Remove use of List in CSS internals in favor of arrays
        • RT-30817: lazy deserialization of css declarations
        • RT-30818: CSS: Avoid creating ObservableList for declarations and 
selectors in Rule

FXML
        • *RT-23527*: Compile FXML to .class file

Tooling
        • RT-13312: Develop GLBenchmark to get baseline performance on any 
particular hardware
        • RT-13313: Performance framework (GPU usage)
        • RT-18326: Implement performance counters (prism.printStats) feature 
for prism-es2 pipe
        • RT-26560: Option to track texture memory allocation
        • RT-30651: 8.0-graphics-scrum-1216: full speed mode seems to be broken

Startup
        • RT-14930: JNLP-start consumes large amount of time
        • RT-20159: Startup regression in controls scrum #371



On Jul 3, 2013, at 9:56 AM, Richard Bair <richard.b...@oracle.com> wrote:

>> Obviously there's a lot going on with the move to gradle, but we are a few 
>> lines of Gradle build code away from JFX on iOS. I'm keen to find out just 
>> how well it will run. 
> 
> In the runs I've seen (not on RoboVM) the main bottleneck is in graphics 
> rendering. We don't know specifically why yet, but we have a lot of ideas. 
> Now that Tobi reports FX + RoboVM (including fonts!) is working, I'm eager to 
> see the performance characteristics as well.
> 
> With the work you've done on the developer workflow and now that we've got an 
> open build running on the device, we are going to need to get organized 
> around measuring, reporting, and fixing performance issues encountered on the 
> device. Likely some of it will be RoboVM related, but there is plenty of 
> optimization to do in Prism as well.
> 
> We've learned a lot about embedded hardware over the last year or so. Some of 
> the things we've learned:
>       - It is almost *always* fill rate limited
>       - Pixel shader complexity costs you
>       - CPU -> GPU bandwidth is very limited
> 
> Solving the fill rate issue is huge. The Android team reckons that you can 
> overwrite the same pixel maybe 2x before you start noticeably losing 
> performance, 3x or more and you're dead. It doesn't even matter what it is 
> you are doing per-pixel (could be simply filling each pixel with a solid 
> color). The fact that you are running a pixel shader for 3x or 4x the number 
> of pixels taxes the hardware.
> 
> So for example, right now I believe we are doing 3x overdraw before we even 
> do anything. I think first we do a clear, then we fill with black, then we 
> fill with the Scene fill color. Then we draw whatever you give us. Obviously 
> this is not optimal!
> 
> For pixel shader complexity -- you can probably get away with more complex 
> pixel shaders if they are only running 1x per pixel, but when they are 
> running 3x or 4x per pixel then the complexity of the pixel shaders burns 
> you. We did a lot of optimizations here already so hopefully we've got this 
> one in good shape. But just something to be aware of.
> 
> The CPU -> GPU bandwidth problem is one that is systemic with all these 
> mobile devices. Higher bus speeds == less battery life, so the devices are 
> designed with low bus speeds and this makes transfer of data between CPU and 
> GPU costly. Games will typically do all the transfer once up front (all the 
> graphics assets for a level are loaded up front) and then during the game 
> they are just adjusting the viewport & vertices (often in vertex shaders so 
> as not to pass much data down to the card), etc. Right now we are doing a 
> tremendous amount of communication with the GPU.  Ironing this out is the 
> basis for the "super shader" (https://javafx-jira.kenai.com/browse/RT-30741).
> 
> I would recommend anybody interested in performance keep the "Open 
> Performance Issues" filter on their JIRA dashboard. There is a link to 221 
> performance issues (most of which are ideas about things to do to improve 
> performance). We also need to close the loop on the other issues we were 
> discussing about jerkiness a couple weeks ago.
> 
> Richard

Reply via email to