GitHub user 2ooom opened a pull request:

    https://github.com/apache/spark/pull/18783

    [SPARK-21254] [WebUI] History UI performance fixes

    ## What changes were proposed in this pull request?
    
    As described in JIRA ticket, History page is taking ~1min to load for cases 
when amount of jobs is 10k+.
    Most of the time is currently being spent on DOM manipulations and all 
additional costs implied by this (browser repaints and reflows).
    PR's goal is not to change any behavior but to optimize time of History UI 
rendering:
    
    1. The most costly operation is setting `innerHTML` for `duration` column 
within a loop, which is [extremely 
unperformant](https://jsperf.com/jquery-append-vs-html-list-performance/24). 
[Refactoring 
](https://github.com/criteo-forks/spark/commit/114943b21a730092aa3249b7a905b240bd46e531)
 this helped to get page load time **down to 10-15s**
    
    2. Second big gain bringing page load time **down to 4s** was [was 
achieved](https://github.com/criteo-forks/spark/commit/f35fdcd5f129339fce75996e9242c88085a9b8ab)
 by detaching table's DOM before parsing it with DataTables jQuery plugin.
    
    3. Another chunk of improvements 
([1](https://github.com/criteo-forks/spark/commit/332b398db7eb3052484d436919185cb0b62b2385),
 
[2](https://github.com/criteo-forks/spark/commit/0af596a547e3a1f2b594a83cbda1f6ef559de86b),
 
[3](https://github.com/criteo-forks/spark/commit/235f164178a09e22306f05090ee1ff5f314a6710))
 was focused on removing unnecessary DOM manipulations that in  total 
contributed ~250ms to page load time.
    
    ## How was this patch tested?
    
    Tested by existing Selenium tests in 
`org.apache.spark.deploy.history.HistoryServerSuite`. Version of HtmlUnitDriver 
had a bug that was preventing rendering the full table and making test `ajax 
rendered relative links are prefixed with uiRoot (spark.ui.proxyBase)` 
constantly fail , so we 
[updated](https://github.com/criteo-forks/spark/commit/96598ab50e795fa7937df497bc84deee7fafce47)
 HtmlUnitDriver version.
    
    Changes were also tested on Criteo's spark-2.1 fork with 20k+ number of 
rows in the table, reducing load time to 4s.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/criteo-forks/spark 
history-ui-perf-fix-upstream-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18783.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18783
    
----
commit 114943b21a730092aa3249b7a905b240bd46e531
Author: Dmitry Parfenchik <[email protected]>
Date:   2017-06-05T20:42:41Z

    [SPARK-21254][WebUI] Improving performance by detaching table DOM before 
processing
    
    Currently all the DOM manipulations are handled in a loop after Mustache
    template is parsed. This causes severe performance issues especially within
    loops iteration over thousands of (attempt/application) records and causing
    all kinds of unnecessary browser work: reflow, repaint, etc.
    
    This could be easily fixed by preparing a DOM node beforehand and doing all
    manipulations within the loops on detached node, reattaching it to the 
document
    only after the work is done.
    
    The most costly operation in this case was setting innerHTML for `duration`
    column within a loop, which is extremely unperformant:
    
    https://jsperf.com/jquery-append-vs-html-list-performance/24
    
    While duration parsing could be done before mustache-template processing 
without
    any additional DOM alteratoins.

commit 332b398db7eb3052484d436919185cb0b62b2385
Author: Dmitry Parfenchik <[email protected]>
Date:   2017-07-30T14:52:24Z

    [SPARK-21254][WebUI] Performance optimization for pagination check
    
    Check whether to display pagination or not on large data sets (10-20k rows)
    was taking up to 50ms because it was iterating over all rows. This could be
    easily done by testing length of array before passing it to mustache.

commit 0af596a547e3a1f2b594a83cbda1f6ef559de86b
Author: Dmitry Parfenchik <[email protected]>
Date:   2017-07-30T15:23:37Z

    [SPARK-21254][WebUI] Removing unnecessary DOM processing
    
    Logic related to `hasMultipleAttempts` flag:
    
     - Hiding attmptId column (if `hasMultipleAttempts = false`)
     - Seting white background color for first 2 columns (if 
`hasMultipleAttempts = true`)
    
    was updating DOM after mustache template processing, which was causing 2 
unnecessary
    iterations over full data set (first through jquery selector, than through 
for-loop).
    
    Refactoring it inside mustache template helps saving 80-90ms on large data 
sets (10k+ rows)

commit 235f164178a09e22306f05090ee1ff5f314a6710
Author: Dmitry Parfenchik <[email protected]>
Date:   2017-07-30T20:06:32Z

    [SPARK-21254][WebUI] further reducing DOM manipulations
    
    Refactoring incomplete requests filter behavior due to inefficency in DOM
    manipulations. We were traversing DOM multiple times just to hide columns
    that we could have avoided rendering in mustache (end date, duration).
    Factoring this logic in mustache template (`showCompletedColumn`) saves
    70-80ms on 10k+ rows.

commit f35fdcd5f129339fce75996e9242c88085a9b8ab
Author: Dmitry Parfenchik <[email protected]>
Date:   2017-07-30T20:26:59Z

    [SPARK-21254][WebUI] Detaching DOM before DataTables plugin processing
    
    Detaching history table wrapper from document before parsing it with 
DataTables plugin
    and reattaching back right after plugin has processed nested DOM. This 
allows to avoid
    huge amount of browser repaints and reflows, reducing initial page load 
time in Chrome
    from 15s to 4s for 20k+ rows

commit 96598ab50e795fa7937df497bc84deee7fafce47
Author: Anna Savarin <[email protected]>
Date:   2017-07-28T09:50:11Z

    [HDP-6774] Fixing failing tests by updating HtmlUnit driver dependency

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to