GitHub user steveloughran opened a pull request:
https://github.com/apache/spark/pull/18979
[SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics
collection fails if a new file isn't yet visible
## What changes were proposed in this pull request?
`BasicWriteTaskStatsTracker.getFileSize()` to catch
`FileNotFoundException`, log @ info and then return 0 as a file size.
This ensures that if a newly created file isn't visible due to the store
not always having create consistency, the metric collection doesn't cause the
failure.
## How was this patch tested?
New test suite included, `BasicWriteTaskStatsTrackerSuite`. This not only
checks the resilience to missing files, but verifies the existing logic as to
how file statistics are gathered.
Note that in the current implementation
1. if you call `Tracker..getFinalStats()` more than once, the file size
count will increase by size of the last file. This could be fixed by clearing
the filename field inside `getFinalStats()` itself.
2. If you pass in an empty or null string to `Tracker.newFile(path)` then
IllegalArgumentException is raised, but only in `getFinalStats()`, rather than
in `newFile`. There's a test for this behaviour in the new suite, as it
verifies that only FNFEs get swallowed.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/steveloughran/spark
cloud/SPARK-21762-missing-files-in-metrics
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18979.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18979
----
commit 8ad28b9bcd6a56b963ab57a5b4937d10f492de33
Author: Steve Loughran <[email protected]>
Date: 2017-08-17T19:35:35Z
SPARK-21762 handle FNFE events in BasicWriteStatsTracker; add a suite of
tests for various file states.
Change-Id: I3269cb901a38b33e399ebef10b2dbcd51ccf9b75
commit 2a113fde1653743a3543df8ada395f320b826a3e
Author: Steve Loughran <[email protected]>
Date: 2017-08-17T20:01:50Z
SPARK-21762 add tests for "" and null filenames
Change-Id: I38ac11c808849e2fd91f4931f4cb5cdfad43e2af
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]