attilapiros commented on a change in pull request #33628:
URL: https://github.com/apache/spark/pull/33628#discussion_r771466221



##########
File path: 
core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala
##########
@@ -184,4 +184,20 @@ class DiskBlockObjectWriterSuite extends SparkFunSuite 
with BeforeAndAfterEach {
     writer.close()
     assert(segment.length === 0)
   }
+
+  test("calling closeAndDelete() on a partial write file") {
+    val (writer, file, writeMetrics) = createWriter()
+
+    writer.write(Long.box(20), Long.box(30))
+    val firstSegment = writer.commitAndGet()
+    assert(firstSegment.length === file.length())
+    assert(writeMetrics.bytesWritten === file.length())
+
+    writer.write(Long.box(40), Long.box(50))
+
+    writer.closeAndDelete()
+    assert(!file.exists())
+    assert(writeMetrics.bytesWritten === firstSegment.length)

Review comment:
       > Like 
[9db7115](https://github.com/apache/spark/commit/9db7115fc980c80ee517f46e6844b39d76c93559)
 changed?
   
   Not exactly. I would keep track of the commited records and not the total 
written records.
   When the committed records is counted then you do not need to increase the 
new var every time when a record is written but only when a huge number of 
records are committed. 
   So this line is not needed:
   
https://github.com/apache/spark/blob/9db7115fc980c80ee517f46e6844b39d76c93559/core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala#L329
   But you need to increase the new var after the line 232 before the reset of 
`numRecordsWritten `: 
https://github.com/apache/spark/blob/9db7115fc980c80ee517f46e6844b39d76c93559/core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala#L231-L233
   
   This way when the file is remove you can decrease the metric with the sum of 
the new var and `numRecordsWritten` at:
   
https://github.com/apache/spark/blob/9db7115fc980c80ee517f46e6844b39d76c93559/core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala#L291
    
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to