Github user tanejagagan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16497#discussion_r95305820
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
    @@ -81,7 +96,11 @@ case class Percentile(
           case arrayData: ArrayData => arrayData.toDoubleArray().toSeq
       }
     
    -  override def children: Seq[Expression] = child :: percentageExpression 
:: Nil
    +  override def children: Seq[Expression] = if (withFrqExpr) {
    --- End diff --
    
    I have given lot of thought to it and if we need to make a difference here.
    Lets take a data set of
    age_string, age, count
    "20", 20, 1
    "15", 15, 1
    "10", 10, 1
    
    
    For Sql  "select percentile( age, count , 0.5 ) from table"
    logically correct values should be  
    children = age::count ::0.5 :: Nil 
    and 
    inputType = IntegerType :: IntegerType::DoubleType::Nil
    
    For sql "select pecentile( age, 0.5 ) from table"
    logically correct values should be  
    children = age::0.5 :: Nil 
    and 
    inputType = IntegerType ::DoubleType::Nil
    
    Here is one example where keeping it logically correct would help 
    For following incorrect SQL "select percentile( age, '10') from table" 
    
    With children = age::'10'::Nil and inputType = IntergerType::StringType:: 
Nil 
    Since both children and inputType is used for dataType validation, the 
error message would be correct as below.
    "argument 2 requires Double type, however, 10 is of String type." 
    
    However With children = age::Literal(1)::'10'::Nil and inputType = 
IntergerType::IntegerType::StringType:: Nil 
    The error message would be NOT correct and confusing as below
    "argument 3 requires Double type, however, 10 is of String type." 
    
    Since both children and dataType are public method i was inclined to keep 
them explicitly correct and therefore i decided to make a difference. 
    Please let me know your thoughts 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to