xuanyuanking commented on a change in pull request #28904:
URL: https://github.com/apache/spark/pull/28904#discussion_r456423370
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
##########
@@ -106,10 +106,8 @@ abstract class CompactibleFileStreamLog[T <: AnyRef :
ClassTag](
interval
}
- /**
- * Filter out the obsolete logs.
- */
- def compactLogs(logs: Seq[T]): Seq[T]
+ /** Determine whether the log should be retained or not. */
+ def shouldRetain(log: T): Boolean
Review comment:
I take some time looking into the detail. Seems like we can use `Stream`
here, which can delay the evaluation for each file status. Then we can both
keep the API and have the improvement together. But for using `Stream[Int]`
here, we'll have a restriction for the implementation of `compactLogs`, which
also needs to return `Stream[Int]`, otherwise, the improvement of streamline
will not take effect.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]