Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/9610#discussion_r44715168
--- Diff:
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
@@ -93,6 +93,29 @@ private[spark] class IndexShuffleBlockResolver(conf:
SparkConf) extends ShuffleB
} {
out.close()
}
+
+ val dataFile = getDataFile(shuffleId, mapId)
+ synchronized {
+ if (dataFile.exists() && indexFile.exists()) {
+ if (dataTmp != null && dataTmp.exists()) {
+ dataTmp.delete()
+ }
+ indexTmp.delete()
+ } else {
+ if (indexFile.exists()) {
+ indexFile.delete()
+ }
+ if (!indexTmp.renameTo(indexFile)) {
+ throw new IOException("fail to rename data file " + indexTmp + "
to " + indexFile)
+ }
+ if (dataFile.exists()) {
+ dataFile.delete()
+ }
+ if (dataTmp != null && dataTmp.exists() &&
!dataTmp.renameTo(dataFile)) {
+ throw new IOException("fail to rename data file " + dataTmp + "
to " + dataFile)
--- End diff --
I don't think there is a particular flaw here, but its a bit hard to follow
since its a mix of first-attempt-wins and last-attempt wins. First attempt if
there is a data file & index file; last attempt if its only an index file. the
problem w/ last-attempt is that this delete will fail on windows if the file is
open for reading, I believe. Though we can't get around that because
SPARK-4085 always requires us to delete some files that might be open, in which
case we hope that we don't run into this race again on the next retry. It
would be nice to minimize that case, though. We'd be closer to
first-attempt-wins if we always wrote a dataFile, even if its empty when
dataTmp == null.
There is also an issue w/ mapStatus & non-deterministic data. It might not
matter which output you get, but the mapstatus should be consistent with the
data that is read. If attempt 1 writes non-empty outputs a,b,c, and attempt 2
writes non-empty outputs d,e,f (which are not committed), the reduce tasks
might get the mapstatus for attempt 2, look for outputs d,e,f, and get nothing
but empty blocks. Matei had suggested writing the mapstatus to a file, so that
subsequent attempts always return the map status corresponding to the first
successful attempt.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]