Github user tanejagagan commented on a diff in the pull request:
https://github.com/apache/spark/pull/16497#discussion_r95305820
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
---
@@ -81,7 +96,11 @@ case class Percentile(
case arrayData: ArrayData => arrayData.toDoubleArray().toSeq
}
- override def children: Seq[Expression] = child :: percentageExpression
:: Nil
+ override def children: Seq[Expression] = if (withFrqExpr) {
--- End diff --
I have given lot of thought to it and if we need to make a difference here.
Lets take a data set of
age_string, age, count
"20", 20, 1
"15", 15, 1
"10", 10, 1
For Sql "select percentile( age, count , 0.5 ) from table"
logically correct values should be
children = age::count ::0.5 :: Nil
and
inputType = IntegerType :: IntegerType::DoubleType::Nil
For sql "select pecentile( age, 0.5 ) from table"
logically correct values should be
children = age::0.5 :: Nil
and
inputType = IntegerType ::DoubleType::Nil
Here is one example where keeping it logically correct would help
For following incorrect SQL "select percentile( age, '10') from table"
With children = age::'10'::Nil and inputType = IntergerType::StringType::
Nil
Since both children and inputType is used for dataType validation, the
error message would be correct as below.
"argument 2 requires Double type, however, 10 is of String type."
However With children = age::Literal(1)::'10'::Nil and inputType =
IntergerType::IntegerType::StringType:: Nil
The error message would be NOT correct and confusing as below
"argument 3 requires Double type, however, 10 is of String type."
Since both children and dataType are public method i was inclined to keep
them explicitly correct and therefore i decided to make a difference.
Please let me know your thoughts
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]