Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95080769
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
CaseWhen(cases, elseValue)
}
}
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the
str1,str2,... or 0 if not found.",
+ extended = """
+ Examples:
+ > SELECT _FUNC_(10, 9, 3, 10, 4);
+ 3
+ """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+ override def nullable: Boolean = false
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ private lazy val ordering =
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+ override def checkInputDataTypes(): TypeCheckResult = {
+ if (children.length <= 1) {
+ TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2
arguments")
+ } else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+ TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to
be of AtomicType")
+ } else
+ TypeCheckResult.TypeCheckSuccess
+ }
+
+ override def dataType: DataType = IntegerType
+
+ override def eval(input: InternalRow): Any = {
+ val target = children.head.eval(input)
+ val targetDataType = children.head.dataType
+ def findEqual(target: Any, params: Seq[Expression], index: Int): Int =
{
+ params.toList match {
--- End diff --
`toList` probably causes performance overhead, I don't think we have to
sacrifice the performance for using the pattern match. In the meantime, I still
believe we don't have to check the data type during the runtime. It's supposed
to be done during the `compile` time or only done once for the first time in
`eval`.
The `Field` evaluation is quite confusing, as @gatorsmile suggested, we
need to describe how to evaluate the value when sub expressions' data type are
different.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]