HeartSaVioR commented on a change in pull request #35526:
URL: https://github.com/apache/spark/pull/35526#discussion_r807405322



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -3927,8 +3927,12 @@ object TimeWindowing extends Rule[LogicalPlan] {
           val projections = windows.map(_ +: child.output)
 
           val filterExpr =
-            window.timeColumn >= windowAttr.getField(WINDOW_START) &&
-              window.timeColumn < windowAttr.getField(WINDOW_END)
+            if (window.windowDuration % window.slideDuration == 0) {

Review comment:
       Probably good to leave code comment like below:
   
   > When the condition `windowDuration % slideDuration = 0` is fulfilled, the 
estimation of the number of windows becomes exact one, which means all produced 
windows are valid.
   
   I'm not a native so the sentence may not be perfect, but may be acceptable 
and understandable.

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
##########
@@ -490,4 +490,46 @@ class DataFrameTimeWindowingSuite extends QueryTest with 
SharedSparkSession {
       assert(attributeReference.dataType == tuple._2)
     }
   }
+
+  test("SPARK-38214: No need to filter data when the sliding window length is 
not redundant") {

Review comment:
       It would be nice if we are clear about what we want to test.
   
   If we want to verify that the change still produces right windows, I'd 
rather verify the boundary of time range explicitly, even it is much verbose.
   
   If we have such test in existing tests, and want to ensure we don't inject 
comparison expression on calculation of time window, we probably need to look 
into logical plan (especially Filter) and verify the expression used in Filter.
   
   If we have such test in existing tests and you feel uneasy to verify the 
expression in Filter node, it's OK to skip adding the test, since it means the 
functionality is validated with existing tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to