[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r106334932 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,105 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- If we try to cast all types to `DoubleType`, the test case for all `StringType` will fail. While if we try to cast all types to `StringType`, parameters of '3' and '3.0' won't be determined as equal. I have a solution to balance, we look at the 1st parameter, if it's of `NumericType`, we implicitly cast all parameters to `DoubleType`, else we cast all parameters to `StringType`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r106334083 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,105 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- I met a problem, if I first try to cast all parameters to DoubleType in `TypeCoercion`(as described in 2nd paragraph last comment). I should really cast in order to know it can success or not, but that will 'execute' in analysis stage, that seems not right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r106157856 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,105 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- I think I should add an interface(and`Field` mixin that) like `ImplicitCastInputTypes`, the difference is that, the new interface casts all parameter to the same type(at least, they can be cast to string type)ã In the new interface, I fist try to cast all parameters to NumericType(if don't do this, s"2.3" won't be determined equal to s"2.30"), if that fails, I cast all parameters to StringType. Do you think this is a good idea? @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r102671021 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,105 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- For the current implementation, I didn't do implicit cast to in accord with Hive. For example: `select field(3, '3', 3);` will return 2 in Hive, while it will return 1 if the implicit cast's done. So I think maybe it's better not to do the implicit cast, and to allow that children have different types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r102616898 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,105 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- we need a rule in `TypeCoercion` to make sure all children are of same type(after implicit cast maybe?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101770992 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- Current code is ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101770685 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,104 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: DataType = IntegerType + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +@tailrec def findEqual(index: Int): Int = { + if (index == dataTypeMatchIndex.length) { +0 + } else { +val value = children(dataTypeMatchIndex(index)).eval(input) +if (value != null && ordering.equiv(target, value)) { + dataTypeMatchIndex(index) +} else { + findEqual(index + 1) +} + } +} +if (target == null) 0 else findEqual(index = 0) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) --- End diff -- I meant just genCode for the children with the same data type. genCode should be lightweight, so it is not a strong option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101770055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- @viirya Maybe the order of `code` and `evalWithIndex` parameters should be changed. @tejasapatil I agree with your opinion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101766864 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,104 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: DataType = IntegerType + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +@tailrec def findEqual(index: Int): Int = { + if (index == dataTypeMatchIndex.length) { +0 + } else { +val value = children(dataTypeMatchIndex(index)).eval(input) +if (value != null && ordering.equiv(target, value)) { + dataTypeMatchIndex(index) +} else { + findEqual(index + 1) +} + } +} +if (target == null) 0 else findEqual(index = 0) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) --- End diff -- You mean do not gen any code, not calling `genCode` at all? I think maybe we can't modify the upper level code for this reason. Is there a better solution? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101690256 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- I think whats already present in the code is ok. Given that there is no better option without adding more complexity, lets stick with it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101688665 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- I think it looks like: ${evalChildren.zip(dataTypes).zipWithIndex.tail.filter { x => dataTypeMatchIndex.contains(x._2) }.foldRight("") { (code: String, evalWithIndex: ((ExprCode, DataType), Int)) => val ((eval, _), index) = evalWithIndex val condition = ctx.genEqual(targetDataType, eval.value, target.value) s""" ${eval.code} if ($condition) { ${ev.value} = ${index}; } else { $code } """ } You can do this with a function like you did before. It will have a empty "else" block at the end. However this doesn't affect the functionality, just dealing with how the code looks. I don't have strong option about this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101688198 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,105 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure( +s"FIELD requires all arguments to be of AtomicType or explicitly indicating NULL") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: DataType = IntegerType + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +@tailrec def findEqual(index: Int): Int = { + if (index == dataTypeMatchIndex.length) { +0 + } else { +val value = children(dataTypeMatchIndex(index)).eval(input) +if (value != null && ordering.equiv(target, value)) { + dataTypeMatchIndex(index) +} else { + findEqual(index + 1) +} + } +} +if (target == null) 0 else findEqual(index = 0) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val dataTypes = children.map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, _), index) = evalWithIndex + s""" +${eval.code} +if (${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { + s""" + ${code1} + else { --- End diff -- This is the floating `else`. As @tejasapatil said, looks weird. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101673768 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- For now, I use `reduceRight`, which I think is a 'special case' function of `foldRight`. If I understand your meaning of floating `else` right(could you please explain it a little bit?), these 2 functions both can't avoid floating `else`, because we need nested `else` in `else` block, like this: `if (xxx) else { if (xxx) else { ... } } `, so if we avoid floating `else` in `genIfElseStructure`, `else` should be in `updateEval`, which will make the code unclear and complicated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101516174 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,104 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: DataType = IntegerType + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +@tailrec def findEqual(index: Int): Int = { + if (index == dataTypeMatchIndex.length) { +0 + } else { +val value = children(dataTypeMatchIndex(index)).eval(input) +if (value != null && ordering.equiv(target, value)) { + dataTypeMatchIndex(index) +} else { + findEqual(index + 1) +} + } +} +if (target == null) 0 else findEqual(index = 0) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) --- End diff -- I think you already call `genCode` here, and only filter type mismatch `ExprCode` in line 441. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101515786 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- Oh, yeah, right, if use `foldLeft`, there is still a floating `else`. We can only use `foldRight` to remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101505199 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- Sorry I don't understand, how to use `foldLeft` approach? I think we can only use `foldRight` or `reduceRight`, because the code for latter children should be nested inner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101478064 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. + * If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value. + * When the paramters have different types, comparing will be done based on type firstly, + * for example, ''999'' won't be considered equal with 999, no implicit cast will be done here. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") --- End diff -- Looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101468470 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,104 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: DataType = IntegerType + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +@tailrec def findEqual(index: Int): Int = { + if (index == dataTypeMatchIndex.length) { +0 + } else { +val value = children(dataTypeMatchIndex(index)).eval(input) +if (value != null && ordering.equiv(target, value)) { + dataTypeMatchIndex(index) +} else { + findEqual(index + 1) +} + } +} +if (target == null) 0 else findEqual(index = 0) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) --- End diff -- There is a filter function on line 441 to avoid genCode for type mismatch children `filter(x => dataTypeMatchIndex.contains(x._2))` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101466954 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. + * If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value. + * When the paramters have different types, comparing will be done based on type firstly, + * for example, ''999'' won't be considered equal with 999, no implicit cast will be done here. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") --- End diff -- You are right, but is it better to be `FIELD requires all arguments to be of AtomicType or explicitly indicating NULL` ? Because `NullType` might be confusing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101438354 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- Nit: Maybe we can use `foldLeft` to replace current approach to get rid of the floating `else`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101436701 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,104 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: DataType = IntegerType + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +@tailrec def findEqual(index: Int): Int = { + if (index == dataTypeMatchIndex.length) { +0 + } else { +val value = children(dataTypeMatchIndex(index)).eval(input) +if (value != null && ordering.equiv(target, value)) { + dataTypeMatchIndex(index) +} else { + findEqual(index + 1) +} + } +} +if (target == null) 0 else findEqual(index = 0) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) --- End diff -- nit: We may not need to genCode for the children with the data types different than the target. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r101436180 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. + * If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value. + * When the paramters have different types, comparing will be done based on type firstly, + * for example, ''999'' won't be considered equal with 999, no implicit cast will be done here. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") --- End diff -- Oh. I mean the error message `s"FIELD requires all arguments to be of AtomicType or NullType"`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r99753353 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -20,8 +20,12 @@ package org.apache.spark.sql.catalyst.expressions import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.catalyst.util.TypeUtils import org.apache.spark.sql.types._ +import scala.annotation.tailrec --- End diff -- Sorry, it's my fault. I will run dev/scalastyle first before I push my changes next time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r99535960 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,99 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +@tailrec def findEqual(index: Int): Int = { + if (index == dataTypeMatchIndex.length) { +0 + } else { +val value = children(dataTypeMatchIndex(index)).eval(input) +if (value != null && ordering.equiv(target, value)) { + dataTypeMatchIndex(index) +} else { + findEqual(index + 1) +} + } +} +if (target == null) 0 else findEqual(index = 0) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val dataTypes = children.map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, _), index) = evalWithIndex + s""" +${eval.code} +if (${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { + s""" + ${code1} + else { + ${code2} + } + """ +} + +ev.copy(code = s""" + ${target.code} --- End diff -- Thanks, I have added a if to bypass the case when target.isNull ==true, which is actually the same as yours. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r97796700 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,99 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +@tailrec def findEqual(index: Int): Int = { + if (index == dataTypeMatchIndex.length) { +0 + } else { +val value = children(dataTypeMatchIndex(index)).eval(input) +if (value != null && ordering.equiv(target, value)) { + dataTypeMatchIndex(index) +} else { + findEqual(index + 1) +} + } +} +if (target == null) 0 else findEqual(index = 0) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val dataTypes = children.map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, _), index) = evalWithIndex + s""" +${eval.code} +if (${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { + s""" + ${code1} + else { + ${code2} + } + """ +} + +ev.copy(code = s""" + ${target.code} --- End diff -- If `target.value == null`, we don't need to evaluate the expressions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r97793794 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -20,8 +20,12 @@ package org.apache.spark.sql.catalyst.expressions import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.catalyst.util.TypeUtils import org.apache.spark.sql.types._ +import scala.annotation.tailrec --- End diff -- About the style. As I see, usually we put the import of scala packages above of spark packages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r96348295 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. --- End diff -- Good idea, thx! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r96347397 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. + * If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value. + * When the paramters have different types, comparing will be done based on type firstly, + * for example, ''999'' won't be considered equal with 999, no implicit cast will be done here. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") --- End diff -- That's for user's explicit indication of NULL, that's legal in Hive's `field` expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r96347281 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -17,11 +17,13 @@ package org.apache.spark.sql +import java.sql.{Date, Timestamp} --- End diff -- My bad. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r96347166 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. --- End diff -- Yes, that's right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95732892 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. --- End diff -- So basically why we only support AtomicType and NullType is mainly due to being consistent with Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95732577 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. --- End diff -- Because Hive GenericUDFField supports primitive type(includes: boolean, byte, short, int...) & null. If the types correspond to Spark SQL, they'll be AtomicType & NullType. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95721703 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. --- End diff -- I think it is better to move this sentence to below the next line. I.e., It's also acceptable to give parameters of different types. When the parameters have different types, ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95720884 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. --- End diff -- Any reason we can only support the two types? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95720330 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -17,11 +17,13 @@ package org.apache.spark.sql +import java.sql.{Date, Timestamp} --- End diff -- Unused change too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95720313 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -17,11 +17,13 @@ package org.apache.spark.sql +import java.sql.{Date, Timestamp} + import org.apache.hadoop.io.{LongWritable, Text} import org.apache.hadoop.mapreduce.lib.input.{TextInputFormat => NewTextInputFormat} import org.scalatest.Matchers._ -import org.apache.spark.sql.catalyst.expressions.NamedExpression +import org.apache.spark.sql.catalyst.expressions.{Literal, NamedExpression} --- End diff -- Unused change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95719439 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. + * If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value. + * When the paramters have different types, comparing will be done based on type firstly, --- End diff -- This description "comparing will be done based on type firstly" looks unclear. Can you rephrase it a bit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95719293 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. + * If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value. + * When the paramters have different types, comparing will be done based on type firstly, + * for example, ''999'' won't be considered equal with 999, no implicit cast will be done here. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") --- End diff -- " or NullType"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95718514 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. + * If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value. --- End diff -- search string -> search expression? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95718465 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. + * If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value. + * When the paramters have different types, comparing will be done based on type firstly, --- End diff -- type: paramters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95102912 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { --- End diff -- @chenghao-intel I have added that annotation and remove the `toList`. Still working on the rule for folding not match datatypes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95102842 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { --- End diff -- Yes, because in `checkEvaluation` function there is : `checkEvaluationWithGeneratedMutableProjection` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95080770 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { + s""" + ${code1} + else { + ${code2} + } + """ +} + +def dataTypeEqualsTarget(evalWithIndex: ((ExprCode, DataType), Int)): Boolean = { + val ((eval, dataType), index) = evalWithIndex + dataType.equals(targetDataType) +} + +ev.copy(code = s""" + ${target.code} + boolean ${ev.isNull} = false; + int ${ev.value} = 0; + ${rest.zip(restDataType).zipWithIndex.map(x => (x._1, x._2 + 1)).filter( --- End diff -- Cool, thx. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95080769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { --- End diff -- `toList` probably causes performance overhead, I don't think we have to sacrifice the performance for using the pattern match. In the meantime, I still believe we don't have to check the data type during the runtime. It's supposed to be done during the `compile` time or only done once for the first time in `eval`. The `Field` evaluation is quite confusing, as @gatorsmile suggested, we need to describe how to evaluate the value when sub expressions' data type are different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95080738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- Maybe I can change this function's name? But actually I can't think of a better name. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95080582 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { --- End diff -- you can add the annotation `@tailrec` for explicitly declare that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95080333 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { --- End diff -- @tejasapatil Actually, I think it's tail recursion, so the compiler will do the optimization, then it has the same performance with iteration edition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95080295 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { --- End diff -- @gatorsmile Actually, the case: `checkAnswer(testData.selectExpr("field('è±è±ä¸ç', 'a', 1.23, true, 'è±è±ä¸ç')"), Row(4))` will produce a child: Seq[Expression] an ArrayBuffer, it's not a list, so can't use head::tail. So there are 2 ways: 1. remove the `toList` and do another pattern match for ArrayBuffer, which I think is not neat. 2. keep the `toList`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95079031 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) +checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2) +checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3) +checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3) +checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, timeStamp1)), 3) +checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int4)), 6) +checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, timeStamp2, int3)), 0) --- End diff -- I have removed line 181, since line 182 actually covers not found case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95078909 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1528,6 +1528,18 @@ object functions { def factorial(e: Column): Column = withExpr { Factorial(e.expr) } /** +* Returns the index of str in (str1, str2, ...) list or 0 if not found. +* It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. +* +* @group normal_funcs +* @since 2.2.0 +*/ + @scala.annotation.varargs + def field(expr1: Column, expr2: Column, exprs: Column*): Column = withExpr { --- End diff -- Sure, I have removed this. Still curious, is it inappropriate to be in Dataset API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95060664 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); --- End diff -- I can give 2 examples, one for Integer, one for String. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95060633 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", --- End diff -- Yeah, it's more reasonable to use expr(n), thx. Probably it should be AtomicType or NullType to support user's writing of null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95060564 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { --- End diff -- @tejasapatil The first 3 strings are base test strings, the 5th is for null of string type. Probably I can remove the 4th one which is not useful. About the tests' role, could you please check another thread where I @ you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95060448 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { --- End diff -- @rxin My bad, will take it out --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95060436 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) +checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2) +checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3) +checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3) +checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, timeStamp1)), 3) +checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int4)), 6) +checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, timeStamp2, int3)), 0) --- End diff -- Line 180 is to test multi types of parameters; Line 181 is to test not found case; Line 182 is to test not found case when parameters are of multi types; Line 183 is to test null in parameter which has >=1 index I think maybe we should refer to Hive's field? In Hive, when not all arguments are numbers && not all arguments are strings, they are not compared as double. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95060369 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) +checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2) +checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3) +checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3) +checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, timeStamp1)), 3) +checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int4)), 6) +checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0) --- End diff -- @tejasapatil It doesn't work with my code because it only support AtomicType. While user's writing of 'null' is NullType, I will add NullType support to be consistent with Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95060307 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) +checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2) +checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3) +checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3) +checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, timeStamp1)), 3) +checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int4)), 6) +checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0) --- End diff -- @gatorsmile Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054878 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. --- End diff -- change `str` to `expr` here as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95055535 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) +checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2) +checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3) +checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3) +checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, timeStamp1)), 3) +checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int4)), 6) +checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0) --- End diff -- This fails with the patch. hc.sql("""SELECT FIELD("tejas", 34, "patil", true, null, "tejas") FROM src LIMIT 1""").collect.foreach(println) Removing `null` makes it work. Can you check on your side ? It worked with Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95055899 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { + s""" + ${code1} + else { + ${code2} + } + """ +} + +def dataTypeEqualsTarget(evalWithIndex: ((ExprCode, DataType), Int)): Boolean = { + val ((eval, dataType), index) = evalWithIndex + dataType.equals(targetDataType) +} + +ev.copy(code = s""" + ${target.code} + boolean ${ev.isNull} = false; + int ${ev.value} = 0; + ${rest.zip(restDataType).zipWithIndex.map(x => (x._1, x._2 + 1)).filter( --- End diff -- `.zipWithIndex.map(x => (x._1, x._2 + 1))` can be simplified as `.zip(Stream from 1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95055160 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { --- End diff -- Do your unit tests cover generated code ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054994 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { --- End diff -- Any reason to not do this iteratively ? I would suggest avoiding recursion to get better perf. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054951 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); --- End diff -- This UDF accepts any mix atomic types so one can even get fancy with the inputs. Would recommend mentioning that in the doc (given that you have tests for that below) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95055000 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) --- End diff -- - nit: space after `if` - checkstyle will fail saying if-else needs to use braces --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95055681 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} --- End diff -- At this point the code at lines 427-428 (ie. `.filter(dataTypeEqualsTarget)`) have ensured that this will always be true. The generated code will have this as `true` and you might as well get rid of the check here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95055087 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { --- End diff -- I feel like you have comprehensive testing but at the same time feels like there is overlap amongst the tests in terms of coverage. There is room of reducing the tests while still having same coverage. eg you don't need 5 strings and use lesser. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95055050 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) --- End diff -- same as previous ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95055128 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { --- End diff -- From the name it felt like this method would put `code1` in `if` block and `code2` in `else` block but turns out thats not the case. That floating `else` looks weird. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054704 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { --- End diff -- Can we avoid `toList` for each recursive call? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054651 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) --- End diff -- `findEqual(target, children.tail, index=1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054605 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) +checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2) +checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3) +checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3) +checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, timeStamp1)), 3) +checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int4)), 6) +checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, timeStamp2, int3)), 0) --- End diff -- What is the purpose of these checks? Based on MySQL's `field` function, the type casting rules is described as ``` If all arguments to FIELD() are strings, all arguments are compared as strings. If all arguments are numbers, they are compared as numbers. Otherwise, the arguments are compared as double. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054575 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) +checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2) +checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3) +checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3) +checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, timeStamp1)), 3) +checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int4)), 6) +checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0) --- End diff -- This is to test `null`. Could you add the description? ``` If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054547 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) --- End diff -- Could you fix the style, based on https://github.com/databricks/scala-style-guide#curly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054508 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 --- End diff -- More examples please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054387 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", --- End diff -- Can we use `expr1, expr2, expr3` here? The type can be any atomic type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95023996 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { --- End diff -- what's "case"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95024068 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); --- End diff -- can we use strings as examples rather than integer literals? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95023960 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1528,6 +1528,18 @@ object functions { def factorial(e: Column): Column = withExpr { Factorial(e.expr) } /** +* Returns the index of str in (str1, str2, ...) list or 0 if not found. +* It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. +* +* @group normal_funcs +* @since 2.2.0 +*/ + @scala.annotation.varargs + def field(expr1: Column, expr2: Column, exprs: Column*): Column = withExpr { --- End diff -- can we remove this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95023939 --- Diff: python/pyspark/sql/functions.py --- @@ -320,6 +320,22 @@ def countDistinct(col, *cols): return Column(jc) +@since(2.2) +def field(*cols): --- End diff -- can we remove this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r94798265 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: Tuple2[Tuple2[ExprCode, DataType], Int]): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { + s""" + ${code1} + else { + ${code2} + } + """ +} + +def dataTypeEqualsTarget(evalWithIndex: Tuple2[Tuple2[ExprCode, DataType], Int]): Boolean = { --- End diff -- `def dataTypeEqualsTarget(evalWithIndex: ((ExprCode, DataType), Int)): Boolean` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r94797482 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) --- End diff -- Nit: one more space before `findEqual` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r94796481 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. --- End diff -- Nit: delete a space before * --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org