[GitHub] [spark] xuanyuanking commented on a change in pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

GitBox Fri, 17 Jul 2020 05:54:56 -0700


xuanyuanking commented on a change in pull request #28904:
URL: https://github.com/apache/spark/pull/28904#discussion_r456423370




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
##########
@@ -106,10 +106,8 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : 
ClassTag](
     interval
   }
 
-  /**
-   * Filter out the obsolete logs.
-   */
-  def compactLogs(logs: Seq[T]): Seq[T]
+  /** Determine whether the log should be retained or not. */
+  def shouldRetain(log: T): Boolean

Review comment:
       I take some time looking into the detail. Seems like we can use `Stream` 
here, which can delay the evaluation for each file status. Then we can both 
keep the API and have the improvement together. But for using `Stream[Int]` 
here, we'll have a restriction for the implementation of `compactLogs`, which 
also needs to return `Stream[Int]`, otherwise, the improvement of streamline 
will not take effect.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xuanyuanking commented on a change in pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

Reply via email to