subject:"\[GitHub\] spark pull request #16476\: \[SPARK\-19084\]\[SQL\] Implement expression field"

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-03-15 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r106334932
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,105 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

If we try to cast all types to `DoubleType`, the test case for all 
`StringType` will fail.
While if we try to cast all types to `StringType`, parameters of '3' and 
'3.0' won't be determined as equal.
I have a solution to balance, we look at the 1st parameter, if it's of 
`NumericType`, we implicitly cast all parameters to `DoubleType`, else we cast 
all parameters to `StringType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-03-15 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r106334083
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,105 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

I met a problem, if I first try to cast all parameters to DoubleType in 
`TypeCoercion`(as described in 2nd paragraph last comment). I should really 
cast in order to know it can success or not, but that will 'execute' in 
analysis stage, that seems not right. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-03-15 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r106157856
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,105 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

I think I should add an interface(and`Field` mixin that) like 
`ImplicitCastInputTypes`, the difference is that, the new interface casts all 
parameter to the same type(at least, they can be cast to string type)ã
In the new interface, I fist try to cast all parameters to NumericType(if 
don't do this, s"2.3" won't be determined equal to s"2.30"), if that fails, I 
cast all parameters to StringType.
Do you think this is a good idea? @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-23 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r102671021
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,105 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

For the current implementation, I didn't do implicit cast to in accord with 
Hive.
For example:
`select field(3, '3', 3);` will return 2 in Hive, while it will return 1 if 
the implicit cast's done.
So I think maybe it's better not to do the implicit cast, and to allow that 
children have different types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-22 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r102616898
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,105 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

we need a rule in `TypeCoercion` to make sure all children are of same 
type(after implicit cast maybe?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101770992
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

Current code is ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101770685
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,104 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: DataType = IntegerType
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+@tailrec def findEqual(index: Int): Int = {
+  if (index == dataTypeMatchIndex.length) {
+0
+  } else {
+val value = children(dataTypeMatchIndex(index)).eval(input)
+if (value != null && ordering.equiv(target, value)) {
+  dataTypeMatchIndex(index)
+} else {
+  findEqual(index + 1)
+}
+  }
+}
+if (target == null) 0 else findEqual(index = 0)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
--- End diff --

I meant just genCode for the children with the same data type. genCode 
should be lightweight, so it is not a strong option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-17 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101770055
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

@viirya Maybe the order of `code` and `evalWithIndex` parameters should be 
changed.
@tejasapatil I agree with your opinion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-17 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101766864
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,104 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: DataType = IntegerType
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+@tailrec def findEqual(index: Int): Int = {
+  if (index == dataTypeMatchIndex.length) {
+0
+  } else {
+val value = children(dataTypeMatchIndex(index)).eval(input)
+if (value != null && ordering.equiv(target, value)) {
+  dataTypeMatchIndex(index)
+} else {
+  findEqual(index + 1)
+}
+  }
+}
+if (target == null) 0 else findEqual(index = 0)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
--- End diff --

You mean do not gen any code, not calling `genCode` at all?
I think maybe we can't modify the upper level code for this reason. Is 
there a better solution?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101690256
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

I think whats already present in the code is ok. Given that there is no 
better option without adding more complexity, lets stick with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101688665
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

I think it looks like:

${evalChildren.zip(dataTypes).zipWithIndex.tail.filter { x =>
  dataTypeMatchIndex.contains(x._2)
}.foldRight("") { (code: String, evalWithIndex: ((ExprCode, DataType), 
Int)) =>
  val ((eval, _), index) = evalWithIndex
  val condition = ctx.genEqual(targetDataType, eval.value, 
target.value) 
  s"""
${eval.code}
if ($condition) {
  ${ev.value} = ${index};
} else {
  $code
}   
  """
}

You can do this with a function like you did before. It will have a empty 
"else" block at the end.

However this doesn't affect the functionality, just dealing with how the 
code looks. I don't have strong option about this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101688198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,105 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(
+s"FIELD requires all arguments to be of AtomicType or explicitly 
indicating NULL")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: DataType = IntegerType
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+@tailrec def findEqual(index: Int): Int = {
+  if (index == dataTypeMatchIndex.length) {
+0
+  } else {
+val value = children(dataTypeMatchIndex(index)).eval(input)
+if (value != null && ordering.equiv(target, value)) {
+  dataTypeMatchIndex(index)
+} else {
+  findEqual(index + 1)
+}
+  }
+}
+if (target == null) 0 else findEqual(index = 0)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val dataTypes = children.map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, _), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
+  s"""
+ ${code1}
+ else {
--- End diff --

This is the floating `else`. As @tejasapatil said, looks weird.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101673768
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

For now, I use `reduceRight`, which I think is a 'special case' function of 
`foldRight`.
If I understand your meaning of floating `else` right(could you please 
explain it a little bit?), these 2 functions both can't avoid floating `else`, 
because we need nested `else` in `else` block, like this:
`if (xxx)
else {
  if (xxx)
  else {
  ...
  }
}
`, so if we avoid floating `else` in `genIfElseStructure`, `else` should be 
in `updateEval`, which will make the code unclear and complicated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101516174
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,104 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: DataType = IntegerType
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+@tailrec def findEqual(index: Int): Int = {
+  if (index == dataTypeMatchIndex.length) {
+0
+  } else {
+val value = children(dataTypeMatchIndex(index)).eval(input)
+if (value != null && ordering.equiv(target, value)) {
+  dataTypeMatchIndex(index)
+} else {
+  findEqual(index + 1)
+}
+  }
+}
+if (target == null) 0 else findEqual(index = 0)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
--- End diff --

I think you already call `genCode` here, and only filter type mismatch 
`ExprCode` in line 441.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101515786
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

Oh, yeah, right, if use `foldLeft`, there is still a floating `else`. We 
can only use `foldRight` to remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101505199
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

Sorry I don't understand, how to use `foldLeft` approach? I think we can 
only use `foldRight` or `reduceRight`, because the code for latter children 
should be nested inner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101478064
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
+ * If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value.
+ * When the paramters have different types, comparing will be done based 
on type firstly,
+ * for example, ''999'' won't be considered equal with 999, no implicit 
cast will be done here.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
--- End diff --

Looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101468470
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,104 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: DataType = IntegerType
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+@tailrec def findEqual(index: Int): Int = {
+  if (index == dataTypeMatchIndex.length) {
+0
+  } else {
+val value = children(dataTypeMatchIndex(index)).eval(input)
+if (value != null && ordering.equiv(target, value)) {
+  dataTypeMatchIndex(index)
+} else {
+  findEqual(index + 1)
+}
+  }
+}
+if (target == null) 0 else findEqual(index = 0)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
--- End diff --

There is a filter function on line 441 to avoid genCode for type mismatch 
children
`filter(x => dataTypeMatchIndex.contains(x._2))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-16 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101466954
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
+ * If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value.
+ * When the paramters have different types, comparing will be done based 
on type firstly,
+ * for example, ''999'' won't be considered equal with 999, no implicit 
cast will be done here.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
--- End diff --

You are right, but is it better to be `FIELD requires all arguments to be 
of AtomicType or explicitly indicating NULL` ? Because `NullType` might be 
confusing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-15 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101438354
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

Nit: Maybe we can use `foldLeft` to replace current approach to get rid of 
the floating `else`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-15 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101436701
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,104 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: DataType = IntegerType
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+@tailrec def findEqual(index: Int): Int = {
+  if (index == dataTypeMatchIndex.length) {
+0
+  } else {
+val value = children(dataTypeMatchIndex(index)).eval(input)
+if (value != null && ordering.equiv(target, value)) {
+  dataTypeMatchIndex(index)
+} else {
+  findEqual(index + 1)
+}
+  }
+}
+if (target == null) 0 else findEqual(index = 0)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
--- End diff --

nit: We may not need to genCode for the children with the data types 
different than the target.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-15 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r101436180
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
+ * If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value.
+ * When the paramters have different types, comparing will be done based 
on type firstly,
+ * for example, ''999'' won't be considered equal with 999, no implicit 
cast will be done here.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
--- End diff --

Oh. I mean the error message `s"FIELD requires all arguments to be of 
AtomicType or NullType"`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-06 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r99753353
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -20,8 +20,12 @@ package org.apache.spark.sql.catalyst.expressions
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.util.TypeUtils
 import org.apache.spark.sql.types._
 
+import scala.annotation.tailrec
--- End diff --

Sorry, it's my fault.
I will run dev/scalastyle first before I push my changes next time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-06 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r99535960
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,99 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+@tailrec def findEqual(index: Int): Int = {
+  if (index == dataTypeMatchIndex.length) {
+0
+  } else {
+val value = children(dataTypeMatchIndex(index)).eval(input)
+if (value != null && ordering.equiv(target, value)) {
+  dataTypeMatchIndex(index)
+} else {
+  findEqual(index + 1)
+}
+  }
+}
+if (target == null) 0 else findEqual(index = 0)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val dataTypes = children.map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, _), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
+  s"""
+ ${code1}
+ else {
+  ${code2}
+ }
+   """
+}
+
+ev.copy(code = s"""
+  ${target.code}
--- End diff --

Thanks, I have added a if to bypass the case when target.isNull ==true, 
which is actually the same as yours.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r97796700
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,99 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+@tailrec def findEqual(index: Int): Int = {
+  if (index == dataTypeMatchIndex.length) {
+0
+  } else {
+val value = children(dataTypeMatchIndex(index)).eval(input)
+if (value != null && ordering.equiv(target, value)) {
+  dataTypeMatchIndex(index)
+} else {
+  findEqual(index + 1)
+}
+  }
+}
+if (target == null) 0 else findEqual(index = 0)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val dataTypes = children.map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, _), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
+  s"""
+ ${code1}
+ else {
+  ${code2}
+ }
+   """
+}
+
+ev.copy(code = s"""
+  ${target.code}
--- End diff --

If `target.value == null`, we don't need to evaluate the expressions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r97793794
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -20,8 +20,12 @@ package org.apache.spark.sql.catalyst.expressions
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.util.TypeUtils
 import org.apache.spark.sql.types._
 
+import scala.annotation.tailrec
--- End diff --

About the style. As I see, usually we put the import of scala packages 
above of spark packages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-16 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r96348295
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
--- End diff --

Good idea, thx!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-16 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r96347397
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
+ * If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value.
+ * When the paramters have different types, comparing will be done based 
on type firstly,
+ * for example, ''999'' won't be considered equal with 999, no implicit 
cast will be done here.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
--- End diff --

That's for user's explicit indication of NULL, that's legal in Hive's 
`field` expression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-16 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r96347281
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -17,11 +17,13 @@
 
 package org.apache.spark.sql
 
+import java.sql.{Date, Timestamp}
--- End diff --

My bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-16 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r96347166
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
--- End diff --

Yes, that's right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95732892
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
--- End diff --

So basically why we only support AtomicType and NullType is mainly due to 
being consistent with Hive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95732577
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
--- End diff --

Because Hive GenericUDFField supports primitive type(includes: boolean, 
byte, short, int...) & null.
If the types correspond to Spark SQL, they'll be AtomicType & NullType.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95721703
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
--- End diff --

I think it is better to move this sentence to below the next line. I.e.,

It's also acceptable to give parameters of different types. When the 
parameters have different types, ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95720884
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
--- End diff --

Any reason we can only support the two types?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95720330
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -17,11 +17,13 @@
 
 package org.apache.spark.sql
 
+import java.sql.{Date, Timestamp}
--- End diff --

Unused change too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95720313
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -17,11 +17,13 @@
 
 package org.apache.spark.sql
 
+import java.sql.{Date, Timestamp}
+
 import org.apache.hadoop.io.{LongWritable, Text}
 import org.apache.hadoop.mapreduce.lib.input.{TextInputFormat => 
NewTextInputFormat}
 import org.scalatest.Matchers._
 
-import org.apache.spark.sql.catalyst.expressions.NamedExpression
+import org.apache.spark.sql.catalyst.expressions.{Literal, NamedExpression}
--- End diff --

Unused change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95719439
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
+ * If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value.
+ * When the paramters have different types, comparing will be done based 
on type firstly,
--- End diff --

This description "comparing will be done based on type firstly" looks 
unclear. Can you rephrase it a bit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95719293
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
+ * If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value.
+ * When the paramters have different types, comparing will be done based 
on type firstly,
+ * for example, ''999'' won't be considered equal with 999, no implicit 
cast will be done here.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(
+e => e.dataType.isInstanceOf[AtomicType] || 
e.dataType.isInstanceOf[NullType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
--- End diff --

" or NullType"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95718514
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
+ * If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value.
--- End diff --

search string -> search expression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95718465
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +344,96 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types.
+ * If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value.
+ * When the paramters have different types, comparing will be done based 
on type firstly,
--- End diff --

type: paramters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95102912
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
--- End diff --

@chenghao-intel I have added that annotation and remove the `toList`.
Still working on the rule for folding not match datatypes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95102842
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
--- End diff --

Yes, because in `checkEvaluation` function there is : 
`checkEvaluationWithGeneratedMutableProjection`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95080770
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
+  s"""
+ ${code1}
+ else {
+  ${code2}
+ }
+   """
+}
+
+def dataTypeEqualsTarget(evalWithIndex: ((ExprCode, DataType), Int)): 
Boolean = {
+  val ((eval, dataType), index) = evalWithIndex
+  dataType.equals(targetDataType)
+}
+
+ev.copy(code = s"""
+  ${target.code}
+  boolean ${ev.isNull} = false;
+  int ${ev.value} = 0;
+  ${rest.zip(restDataType).zipWithIndex.map(x => (x._1, x._2 + 
1)).filter(
--- End diff --

Cool, thx.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95080769
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
--- End diff --

`toList` probably causes performance overhead, I don't think we have to 
sacrifice the performance for using the pattern match. In the meantime, I still 
believe we don't have to check the data type during the runtime. It's supposed 
to be done during the `compile` time or only done once for the first time in 
`eval`.

The `Field` evaluation is quite confusing, as @gatorsmile suggested, we 
need to describe how to evaluate the value when sub expressions' data type are 
different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95080738
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

Maybe I can change this function's name? But actually I can't think of a 
better name. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95080582
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
--- End diff --

you can add the annotation `@tailrec` for explicitly declare that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95080333
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
--- End diff --

@tejasapatil Actually, I think it's tail recursion, so the compiler will do 
the optimization, then it has the same performance with iteration edition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95080295
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
--- End diff --

@gatorsmile Actually, the case:
`checkAnswer(testData.selectExpr("field('è±è±ä¸ç', 'a', 1.23, true, 
'è±è±ä¸ç')"), Row(4))`
will produce a child: Seq[Expression] an ArrayBuffer, it's not a list, so 
can't use head::tail.
So there are 2 ways: 
1. remove the `toList` and do another pattern match for ArrayBuffer, which 
I think is not neat.
2. keep the `toList`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95079031
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
+checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2)
+checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3)
+checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3)
+checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, 
timeStamp1)), 3)
+checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int4)), 6)
+checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, 
timeStamp2, int3)), 0)
--- End diff --

I have removed line 181, since line 182 actually covers not found case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95078909
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1528,6 +1528,18 @@ object functions {
   def factorial(e: Column): Column = withExpr { Factorial(e.expr) }
 
   /**
+* Returns the index of str in (str1, str2, ...) list or 0 if not found.
+* It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+*
+* @group normal_funcs
+* @since 2.2.0
+*/
+  @scala.annotation.varargs
+  def field(expr1: Column, expr2: Column, exprs: Column*): Column = 
withExpr {
--- End diff --

Sure, I have removed this.
Still curious, is it inappropriate to be in Dataset API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95060664
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
--- End diff --

I can give 2 examples, one for Integer, one for String.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95060633
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
--- End diff --

Yeah, it's more reasonable to use expr(n), thx.
Probably it should be AtomicType or NullType to support user's writing of 
null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95060564
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
--- End diff --

@tejasapatil The first 3 strings are base test strings, the 5th is for null 
of string type. Probably I can remove the 4th one which is not useful.
About the tests' role, could you please check another thread where I @ you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95060448
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
--- End diff --

@rxin My bad, will take it out


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95060436
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
+checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2)
+checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3)
+checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3)
+checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, 
timeStamp1)), 3)
+checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int4)), 6)
+checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, 
timeStamp2, int3)), 0)
--- End diff --

Line 180 is to test multi types of parameters;
Line 181 is to test not found case;
Line 182 is to test not found case when parameters are of multi types;
Line 183 is to test null in parameter which has >=1 index
I think maybe we should refer to Hive's field? In Hive, when not all 
arguments are numbers && not all arguments are strings, they are not compared 
as double.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95060369
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
+checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2)
+checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3)
+checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3)
+checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, 
timeStamp1)), 3)
+checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int4)), 6)
+checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0)
--- End diff --

@tejasapatil It doesn't work with my code because it only support 
AtomicType. While user's writing of 'null' is NullType, I will add NullType 
support to be consistent with Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95060307
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
+checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2)
+checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3)
+checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3)
+checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, 
timeStamp1)), 3)
+checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int4)), 6)
+checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0)
--- End diff --

@gatorsmile  Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054878
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
--- End diff --

change `str` to `expr` here as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95055535
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
+checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2)
+checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3)
+checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3)
+checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, 
timeStamp1)), 3)
+checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int4)), 6)
+checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0)
--- End diff --

This fails with the patch. 

hc.sql("""SELECT FIELD("tejas", 34, "patil", true, null, "tejas") FROM 
src LIMIT 1""").collect.foreach(println)

Removing `null` makes it work. Can you check on your side ? It worked with 
Hive.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95055899
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
+  s"""
+ ${code1}
+ else {
+  ${code2}
+ }
+   """
+}
+
+def dataTypeEqualsTarget(evalWithIndex: ((ExprCode, DataType), Int)): 
Boolean = {
+  val ((eval, dataType), index) = evalWithIndex
+  dataType.equals(targetDataType)
+}
+
+ev.copy(code = s"""
+  ${target.code}
+  boolean ${ev.isNull} = false;
+  int ${ev.value} = 0;
+  ${rest.zip(restDataType).zipWithIndex.map(x => (x._1, x._2 + 
1)).filter(
--- End diff --

`.zipWithIndex.map(x => (x._1, x._2 + 1))` can be simplified as 
`.zip(Stream from 1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95055160
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
--- End diff --

Do your unit tests cover generated code ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054994
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
--- End diff --

Any reason to not do this iteratively ? I would suggest avoiding recursion 
to get better perf. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054951
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
--- End diff --

This UDF accepts any mix atomic types so one can even get fancy with the 
inputs. Would recommend mentioning that in the doc (given that you have tests 
for that below)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95055000
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
--- End diff --

- nit: space after `if`
- checkstyle will fail saying if-else needs to use braces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95055681
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
--- End diff --

At this point the code at lines 427-428 (ie. 
`.filter(dataTypeEqualsTarget)`) have ensured that this will always be true. 
The generated code will have this as `true` and you might as well get rid of 
the check here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95055087
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
--- End diff --

I feel like you have comprehensive testing but at the same time feels like 
there is overlap amongst the tests in terms of coverage. There is room of 
reducing the tests while still having same coverage. eg you don't need 5 
strings and use lesser.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95055050
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
--- End diff --

same as previous ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95055128
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: ((ExprCode, DataType), Int)): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
--- End diff --

From the name it felt like this method would put `code1` in `if` block and 
`code2` in `else` block but turns out thats not the case. That floating `else` 
looks weird.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-07 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054704
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
--- End diff --

Can we avoid `toList` for each recursive call?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054651
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
--- End diff --

`findEqual(target, children.tail, index=1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054605
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
+checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2)
+checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3)
+checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3)
+checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, 
timeStamp1)), 3)
+checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int4)), 6)
+checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, 
timeStamp2, int3)), 0)
--- End diff --

What is the purpose of these checks?

Based on MySQL's `field` function, the type casting rules is described as
```
If all arguments to FIELD() are strings, all arguments are compared as 
strings. If all arguments are numbers, they are compared as numbers. Otherwise, 
the arguments are compared as double.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054575
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
+checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2)
+checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3)
+checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3)
+checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, 
timeStamp1)), 3)
+checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int4)), 6)
+checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0)
--- End diff --

This is to test `null`. Could you add the description? 
```
If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054547
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
--- End diff --

Could you fix the style, based on 
https://github.com/databricks/scala-style-guide#curly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054508
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
--- End diff --

More examples please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054387
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
--- End diff --

Can we use `expr1, expr2, expr3` here? The type can be any atomic type? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95023996
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
--- End diff --

what's "case"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95024068
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
--- End diff --

can we use strings as examples rather than integer literals?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95023960
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1528,6 +1528,18 @@ object functions {
   def factorial(e: Column): Column = withExpr { Factorial(e.expr) }
 
   /**
+* Returns the index of str in (str1, str2, ...) list or 0 if not found.
+* It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+*
+* @group normal_funcs
+* @since 2.2.0
+*/
+  @scala.annotation.varargs
+  def field(expr1: Column, expr2: Column, exprs: Column*): Column = 
withExpr {
--- End diff --

can we remove this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95023939
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -320,6 +320,22 @@ def countDistinct(col, *cols):
 return Column(jc)
 
 
+@since(2.2)
+def field(*cols):
--- End diff --

can we remove this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-05 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r94798265
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+  * A function that returns the index of str in (str1, str2, ...) list or 
0 if not found.
+  * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+  */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+ findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: Tuple2[Tuple2[ExprCode, DataType], 
Int]): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
+  s"""
+ ${code1}
+ else {
+  ${code2}
+ }
+   """
+}
+
+def dataTypeEqualsTarget(evalWithIndex: Tuple2[Tuple2[ExprCode, 
DataType], Int]): Boolean = {
--- End diff --

`def dataTypeEqualsTarget(evalWithIndex: ((ExprCode, DataType), Int)): 
Boolean`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-05 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r94797482
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+  * A function that returns the index of str in (str1, str2, ...) list or 
0 if not found.
+  * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+  */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+ findEqual(target, children.tail, 1)
--- End diff --

Nit: one more space before `findEqual`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-05 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r94796481
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+  * A function that returns the index of str in (str1, str2, ...) list or 
0 if not found.
--- End diff --

Nit: delete a space before *


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

82 matches

Mail list logo