GitHub user 2ooom opened a pull request:
https://github.com/apache/spark/pull/18777
[SPARK-21254] [WebUI] History UI Performance fixes
## What changes were proposed in this pull request?
As described in JIRA ticket, History page is taking ~1min to load for cases
when amount of jobs is 10k+.
Most of the time is currently being spent on DOM manipulations and all
additional costs implied by this (browser repaints and reflows).
PR's goal is not to change any behavior but to optimize time of History UI
rendering:
1. The most costly operation is setting `innerHTML` for `duration` column
within a loop, which is [extremely
unperformant](https://jsperf.com/jquery-append-vs-html-list-performance/24).
[Refactoring
](https://github.com/criteo-forks/spark/commit/170dfe615883d869d1da7e581dfdbc9ce191afd6)
this helped to get time **down to 10-15s**
2. Second big gain bringing page load time **down to 4s** was [was
achieved](https://github.com/criteo-forks/spark/commit/2f72c98de4c092a29fa3a0eb9bd229d6bada25e5)
by detaching table's DOM before parsing it with DataTables jQuery plugin.
3. Another chunk of improvements
([1](https://github.com/criteo-forks/spark/commit/07c6a3f57dfcc659d41c59bb394dbc3f8fa989e3),
[2](https://github.com/criteo-forks/spark/commit/14da1621e17d0301a837a24a3307db8f43a0a102),
[3](https://github.com/criteo-forks/spark/commit/2f72c98de4c092a29fa3a0eb9bd229d6bada25e5))
was focused on removing unnecessary DOM manipulations that in total
contributed ~250ms to page load time.
## How was this patch tested?
Tested by existing Selenium tests in
`org.apache.spark.deploy.history.HistoryServerSuite`. Version of HtmlUnitDriver
had a bug that was preventing rendering the full table and making test `ajax
rendered relative links are prefixed with uiRoot (spark.ui.proxyBase)`
constantly fail.
Changes were also tested on Criteo's spark-2.1 fork with 20k+ number of
rows in the table, reducing load time to 4s.
Please UI screenshot (attached) for no visual differences:

You can merge this pull request into a Git repository by running:
$ git pull https://github.com/criteo-forks/spark
history-ui-perf-fix-upstream-2.1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18777.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18777
----
commit 170dfe615883d869d1da7e581dfdbc9ce191afd6
Author: Dmitry Parfenchik <[email protected]>
Date: 2017-06-05T20:42:41Z
History UI: Improving performance by detaching table DOM before processing
Currently all the DOM manipulations are handled in a loop after Mustache
template is parsed. This causes severe performance issues especially within
loops iteration over thousands of (attempt/application) records and causing
all kinds of unnecessary browser work: reflow, repaint, etc.
This could be easily fixed by preparing a DOM node beforehand and doing all
manipulations within the loops on detached node, reattaching it to the
document
only after the work is done.
The most costly operation in this case was setting innerHTML for `duration`
column within a loop, which is extremely unperformant:
https://jsperf.com/jquery-append-vs-html-list-performance/24
While duration parsing could be done before mustache-template processing
without
any additional DOM alteratoins.
commit 07c6a3f57dfcc659d41c59bb394dbc3f8fa989e3
Author: Dmitry Parfenchik <[email protected]>
Date: 2017-07-30T14:52:24Z
Performance optimization for pagination check
Check whether to display pagination or not on large data sets (10-20k rows)
was taking up to 50ms because it was iterating over all rows. This could be
easily done by testing length of array before passing it to mustache.
commit 14da1621e17d0301a837a24a3307db8f43a0a102
Author: Dmitry Parfenchik <[email protected]>
Date: 2017-07-30T15:23:37Z
Performance improvement: removing unnecessary DOM processing
Logic related to `hasMultipleAttempts` flag:
- Hiding attmptId column (if `hasMultipleAttempts = false`)
- Seting white background color for first 2 columns (if
`hasMultipleAttempts = true`)
was updating DOM after mustache template processing, which was causing 2
unnecessary
iterations over full data set (first through jquery selector, than through
for-loop).
Refactoring it inside mustache template helps saving 80-90ms on large data
sets (10k+ rows)
commit 80c11663c28d183161648de5f7a32fa3ae49cfd8
Author: Dmitry Parfenchik <[email protected]>
Date: 2017-07-30T20:06:32Z
Performance improvement: further reducing DOM manipulations
Refactoring incomplete requests filter behavior due to inefficency in DOM
manipulations. We were traversing DOM 2 more times just to hide columns
that we could have avoided rendering in mustache. Factoring this logic in
mustache template (`showCompletedColumn`) saves 70-80ms on 10k+ rows.
commit 2f72c98de4c092a29fa3a0eb9bd229d6bada25e5
Author: Dmitry Parfenchik <[email protected]>
Date: 2017-07-30T20:26:59Z
Performance improvements: detaching DOM before DataTables plugin processing
Detaching history table wrapper from document before parsing it with
DataTables plugin
and reattaching back right after plugin has processed nested DOM. This
allows to avoid
huge amount of browser repaints and reflows, reducing initial page load
time in Chrome
from 15s to 4s for 20k+ rows
commit e487a4eda4cfbd311f1587ad2852688f94f3b6a9
Author: Anna Savarin <[email protected]>
Date: 2017-07-28T09:50:11Z
[HDP-6774] Fixing failing tests by updating HtmlUnit driver dependency
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]