Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1678#discussion_r15682412
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
    @@ -147,28 +147,36 @@ private[spark] class DiskBlockObjectWriter(
     
       override def isOpen: Boolean = objOut != null
     
    -  override def commit(): Long = {
    +  override def commitAndClose(): Unit = {
         if (initialized) {
           // NOTE: Because Kryo doesn't flush the underlying stream we 
explicitly flush both the
           //       serializer stream and the lower level stream.
           objOut.flush()
           bs.flush()
    -      val prevPos = lastValidPosition
    -      lastValidPosition = channel.position()
    -      lastValidPosition - prevPos
    -    } else {
    -      // lastValidPosition is zero if stream is uninitialized
    -      lastValidPosition
    +      close()
         }
    +    finalPosition = file.length()
       }
     
    -  override def revertPartialWrites() {
    -    if (initialized) {
    -      // Discard current writes. We do this by flushing the outstanding 
writes and
    -      // truncate the file to the last valid position.
    -      objOut.flush()
    -      bs.flush()
    -      channel.truncate(lastValidPosition)
    +  // Discard current writes. We do this by flushing the outstanding writes 
and then
    +  // truncating the file to its initial position.
    +  override def revertPartialWritesAndClose() {
    +    try {
    +      if (initialized) {
    +        objOut.flush()
    +        bs.flush()
    +        close()
    +      }
    +
    +      val truncateStream = new FileOutputStream(file, true)
    +      try {
    +        truncateStream.getChannel.truncate(initialPosition)
    +      } finally {
    +        truncateStream.close()
    +      }
    +    } catch {
    +      case e: Exception =>
    +        logError("Uncaught exception while reverting partial writes to 
file " + file, e)
    --- End diff --
    
    In the use of writers in HashShuffleWriter, it is possible for a closed 
stream to be reverted (if some other stream's close failed for example).
    In the above, that will leave this file with leftover data - I am not sure 
what the impact of this would be.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to