HeartSaVioR commented on a change in pull request #28904:
URL: https://github.com/apache/spark/pull/28904#discussion_r444051029



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
##########
@@ -222,21 +256,22 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : 
ClassTag](
         try {
           val logs =
             getAllValidBatches(latestId, compactInterval).flatMap { id =>
-              super.get(id).getOrElse {
+              filterInBatch(id)(shouldRetain).getOrElse {
                 throw new IllegalStateException(
                   s"${batchIdToPath(id)} doesn't exist " +
                     s"(latestId: $latestId, compactInterval: 
$compactInterval)")
               }
             }
-          return compactLogs(logs).toArray
+          return logs.toArray
         } catch {
           case e: IOException =>
             // Another process using `CompactibleFileStreamLog` may delete the 
batch files when
             // `StreamFileIndex` are reading. However, it only happens when a 
compaction is
             // deleting old files. If so, let's try the next compaction batch 
and we should find it.
             // Otherwise, this is a real IO issue and we should throw it.
-            latestId = nextCompactionBatchId(latestId, compactInterval)
-            super.get(latestId).getOrElse {
+            val expectedMinLatestId = nextCompactionBatchId(latestId, 
compactInterval)

Review comment:
       This new approach is to avoid reading the next compact file log, which 
materializes all entries into the file. It should be extreme case, so it's also 
OK to keep this as it is if someone strongly think the previous one is better.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to