[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isinSet in DataFrame AP...

dongjoon-hyun Thu, 24 May 2018 13:34:55 -0700

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21416#discussion_r190722260
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
    @@ -220,6 +219,7 @@ object OptimizeIn extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case q: LogicalPlan => q transformExpressionsDown {
           case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
    +      case In(v, list) if list.length == 1 => EqualTo(v, list.head)
    --- End diff --
    
    Could you add the following test case, too?
    ```scala
    scala> sql("select * from t group by a having count(*) = (select count(*) 
from t)").explain
    == Physical Plan ==
    *(2) Project [a#2L]
    +- *(2) Filter (count(1)#75L = Subquery subquery62)
       :  +- Subquery subquery62
       :     +- *(2) HashAggregate(keys=[], functions=[count(1)])
       :        +- Exchange SinglePartition
       :           +- *(1) HashAggregate(keys=[], functions=[partial_count(1)])
       :              +- *(1) Project
       :                 +- *(1) Range (0, 1, step=1, splits=8)
       +- *(2) HashAggregate(keys=[a#2L], functions=[count(1)])
          +- Exchange hashpartitioning(a#2L, 200)
             +- *(1) HashAggregate(keys=[a#2L], functions=[partial_count(1)])
                +- *(1) Project [id#0L AS a#2L]
                   +- *(1) Range (0, 1, step=1, splits=8)
    
    scala> sql("select * from t group by a having count(*) in (select count(*) 
from t)").explain
    java.lang.StackOverflowError
      at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isinSet in DataFrame AP...

Reply via email to