[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20981


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180685351
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,28 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an `InternalRow` with given data type. The 
returned accessor
+   * actually takes a `SpecializedGetters` input because it can be 
generalized to other classes
+   * that implements `SpecializedGetters` (e.g., `ArrayData`) too.
+   */
+  def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = 
dataType match {
--- End diff --

Hehe - good point. Let's leave it here for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180684792
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,28 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an `InternalRow` with given data type. The 
returned accessor
+   * actually takes a `SpecializedGetters` input because it can be 
generalized to other classes
+   * that implements `SpecializedGetters` (e.g., `ArrayData`) too.
+   */
+  def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = 
dataType match {
--- End diff --

As `SpecializedGetters` is a java interface, seems we can't create a 
companion object for it? Or I miss something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180583831
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,28 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an `InternalRow` with given data type. The 
returned accessor
+   * actually takes a `SpecializedGetters` input because it can be 
generalized to other classes
+   * that implements `SpecializedGetters` (e.g., `ArrayData`) too.
+   */
+  def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = 
dataType match {
--- End diff --

Ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180583775
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -65,11 +65,19 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 checkEvaluationWithOptimization(expr, catalystValue, inputRow)
   }
 
+
+  private def getActualDataType(dt: DataType): DataType = dt match {
--- End diff --

I can use the method you added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180475548
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,28 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an `InternalRow` with given data type. The 
returned accessor
+   * actually takes a `SpecializedGetters` input because it can be 
generalized to other classes
+   * that implements `SpecializedGetters` (e.g., `ArrayData`) too.
+   */
+  def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = 
dataType match {
--- End diff --

Perhaps we should move this to the companion object of `SpecializedGetters`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180468634
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -65,11 +65,19 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 checkEvaluationWithOptimization(expr, catalystValue, inputRow)
   }
 
+
+  private def getActualDataType(dt: DataType): DataType = dt match {
--- End diff --

I added a similar method to the `UserDefinedType` companion. Which one 
shall we add?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180423825
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,26 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an InternalRow with given data type and 
ordinal.
+   */
+  def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any 
= dataType match {
--- End diff --

Ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180394552
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,26 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an InternalRow with given data type and 
ordinal.
+   */
+  def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any 
= dataType match {
--- End diff --

Well you could generalize to make it take an ordinal?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180356541
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,26 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an InternalRow with given data type and 
ordinal.
+   */
+  def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any 
= dataType match {
--- End diff --

Ah, no. The accessor in #20984 takes ordinal as input parameter.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180356241
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,26 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an InternalRow with given data type and 
ordinal.
+   */
+  def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any 
= dataType match {
--- End diff --

Maybe we can make the accessor accept `SpecializedGetters`, so this can be 
reused in #20984 too...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180008583
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,25 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an InternalRow with given data type and 
ordinal.
+   */
+  def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any 
= dataType match {
+case BooleanType => (input) => input.getBoolean(ordinal)
+case ByteType => (input) => input.getByte(ordinal)
+case ShortType => (input) => input.getShort(ordinal)
+case IntegerType | DateType => (input) => input.getInt(ordinal)
+case LongType | TimestampType => (input) => input.getLong(ordinal)
+case FloatType => (input) => input.getFloat(ordinal)
+case DoubleType => (input) => input.getDouble(ordinal)
+case StringType => (input) => input.getUTF8String(ordinal)
+case BinaryType => (input) => input.getBinary(ordinal)
+case CalendarIntervalType => (input) => input.getInterval(ordinal)
+case t: DecimalType => (input) => input.getDecimal(ordinal, 
t.precision, t.scale)
+case t: StructType => (input) => input.getStruct(ordinal, t.size)
+case _: ArrayType => (input) => input.getArray(ordinal)
+case _: MapType => (input) => input.getMap(ordinal)
+case _ => (input) => input.get(ordinal, dataType)
--- End diff --

Handle `UDT`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180008527
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -33,28 +33,14 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
 
   override def toString: String = s"input[$ordinal, 
${dataType.simpleString}, $nullable]"
 
+  private lazy val accessor: InternalRow => Any = 
InternalRow.getAccessor(dataType, ordinal)
--- End diff --

Do we need to be lazy?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-05 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r179420439
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala
 ---
@@ -277,4 +278,31 @@ class ObjectExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
   checkEvaluation(decodeUsingSerializer, null, 
InternalRow.fromSeq(Seq(null)))
 }
   }
+
+  test("LambdaVariable should support interpreted execution") {
+val elementTypes = Seq(BooleanType, ByteType, ShortType, IntegerType, 
LongType, FloatType,
+  DoubleType, DecimalType.USER_DEFAULT, StringType, BinaryType, 
DateType, TimestampType,
+  CalendarIntervalType)
+val arrayTypes = elementTypes.flatMap { elementType =>
+  Seq(ArrayType(elementType, containsNull = false), 
ArrayType(elementType, containsNull = true))
+}
+val mapTypes = elementTypes.flatMap { elementType =>
+  Seq(MapType(elementType, elementType, false), MapType(elementType, 
elementType, true))
+}
+val structTypes = elementTypes.flatMap { elementType =>
+  Seq(StructType(StructField("col1", elementType, false) :: Nil),
+StructType(StructField("col1", elementType, true) :: Nil))
+}
+
+val acceptedTypes = elementTypes ++ arrayTypes ++ mapTypes ++ 
structTypes
+val random = new Random()
--- End diff --

Please set a seed or use `withClue()` otherwise it will be very annoying to 
debug these when something goes south.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-05 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r179418102
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -550,11 +550,33 @@ case class LambdaVariable(
 dataType: DataType,
 nullable: Boolean = true) extends LeafExpression with NonSQLExpression 
{
 
+  private lazy val accessor: InternalRow => Any = dataType match {
--- End diff --

We could make this a bit more generic and also use this for 
`BoundReference`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-04 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/20981

[SPARK-23873][SQL] Use accessors in interpreted LambdaVariable

## What changes were proposed in this pull request?

Currently, interpreted execution of `LambdaVariable` just uses 
`InternalRow.get` to access element. We should use specified accessors if 
possible.

## How was this patch tested?

Added test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-23873

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20981.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20981


commit 35539be14bc79647409bfcdc5dac1fb99d00f5ec
Author: Liang-Chi Hsieh 
Date:   2018-04-05T03:57:17Z

Use accessors in interpreted LambdaVariable.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org