GitHub user devaraj-kavali opened a pull request:
https://github.com/apache/spark/pull/22752
[SPARK-24787][CORE] Revert hsync in EventLoggingListener and make
FsHistoryProvider to read lastBlockBeingWritten data for logs
## What changes were proposed in this pull request?
`hsync` has been added as part of SPARK-19531 to get the latest data in the
history sever ui, but that is causing the performance overhead and also leading
to drop many history log events. `hsync` uses the force `FileChannel.force` to
sync the data to the disk and happens for the data pipeline, it is costly
operation and making the application to face overhead and drop the events.
I think getting the latest data in history server can be done in different
way (no impact to application while writing events), there is an api
`DFSInputStream.getFileLength()` which gives the file length including the
`lastBlockBeingWrittenLength`(different from `FileStatus.getLen()`), this api
can be used when the file status length and previously cached length are equal
to verify whether any new data has been written or not, if there is any update
in data length then the history server can update the in progress history log.
And also I made this change as configurable with the default value false, and
can be enabled for history server if users want to see the updated data in ui.
## How was this patch tested?
Added new test and verified manually, with the added conf
`spark.history.fs.inProgressAbsoluteLengthCheck.enabled=true`, history server
is reading the logs including the last block data which is being written and
updating the Web UI with the latest data.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/devaraj-kavali/spark SPARK-24787
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22752.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22752
----
commit a3f53c41879e28d71d4dbd79d80a51e50d82ecee
Author: Devaraj K <devaraj@...>
Date: 2018-10-16T23:50:20Z
[SPARK-24787][CORE] Revert hsync in EventLoggingListener and make
FsHistoryProvider to read lastBlockBeingWritten data for logs
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]