[GitHub] [spark] HeartSaVioR commented on a change in pull request #35680: [SPARK-38349][SS] No need to filter events when sessionwindow gapDuration greater than 0

GitBox Mon, 28 Mar 2022 02:50:02 -0700


HeartSaVioR commented on a change in pull request #35680:
URL: https://github.com/apache/spark/pull/35680#discussion_r836229369




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4061,10 +4061,23 @@ object SessionWindowing extends Rule[LogicalPlan] {
           case s: SessionWindow => sessionAttr
         }
 
+        val filterTimeSize = gapDuration.child.dataType match {

Review comment:
       I guess we can simplify this a bit more like below:
   
   ```
           val filterByTimeRange = session.gapDuration match {
             case Literal(interval: CalendarInterval, CalendarIntervalType) =>
               interval == null || interval.months + interval.days + 
interval.microseconds <= 0
   
             case _ => true
           }
   ```

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSessionWindowingSuite.scala
##########
@@ -495,4 +495,65 @@ class DataFrameSessionWindowingSuite extends QueryTest 
with SharedSparkSession
       validateWindowColumnInSchema(schema2, "session")
     }
   }
+
+  test("SPARK-38349: No need to filter events when gapDuration greater than 
0") {
+    // negative value

Review comment:
       Logic for verification can be deduplicated via adding inner-method in 
this test like below:
   
   ```
       def checkFilterCondition(
           logicalPlan: LogicalPlan,
           expectTimeRange: Boolean,
           assertHintMsg: String): Unit = {
         val filter = logicalPlan.find { plan =>
           plan.isInstanceOf[Filter] && plan.children.head.isInstanceOf[Project]
         }
         assert(filter.isDefined)
         val exist = filter.get.expressions.flatMap { expr =>
           expr.collect { case gt: GreaterThan => gt }
         }
         if (expectTimeRange) {
           assert(exist.nonEmpty, assertHintMsg)
         } else {
           assert(exist.isEmpty, assertHintMsg)
         }
       }
   ```
   
   and call the method per case.
   
   Let's use logicalPlan (`df.queryExecution.logical`) instead of optimizedPlan 
which Spark can optimize the plan and may have changed the plan out of the 
expectation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #35680: [SPARK-38349][SS] No need to filter events when sessionwindow gapDuration greater than 0

Reply via email to