Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16476#discussion_r95080769
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
    @@ -340,3 +341,91 @@ object CaseKeyWhen {
         CaseWhen(cases, elseValue)
       }
     }
    +
    +/**
    + * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
    + * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
    +  extended = """
    +    Examples:
    +      > SELECT _FUNC_(10, 9, 3, 10, 4);
    +       3
    +  """)
    +case class Field(children: Seq[Expression]) extends Expression {
    +
    +  override def nullable: Boolean = false
    +  override def foldable: Boolean = children.forall(_.foldable)
    +
    +  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    if (children.length <= 1) {
    +      TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
    +    } else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
    +      TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
    +    } else
    +      TypeCheckResult.TypeCheckSuccess
    +  }
    +
    +  override def dataType: DataType = IntegerType
    +
    +  override def eval(input: InternalRow): Any = {
    +    val target = children.head.eval(input)
    +    val targetDataType = children.head.dataType
    +    def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
    +      params.toList match {
    --- End diff --
    
    `toList` probably causes performance overhead, I don't think we have to 
sacrifice the performance for using the pattern match. In the meantime, I still 
believe we don't have to check the data type during the runtime. It's supposed 
to be done during the `compile` time or only done once for the first time in 
`eval`.
    
    The `Field` evaluation is quite confusing, as @gatorsmile suggested, we 
need to describe how to evaluate the value when sub expressions' data type are 
different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to