[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20981 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180685351 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,28 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an `InternalRow` with given data type. The returned accessor + * actually takes a `SpecializedGetters` input because it can be generalized to other classes + * that implements `SpecializedGetters` (e.g., `ArrayData`) too. + */ + def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = dataType match { --- End diff -- Hehe - good point. Let's leave it here for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180684792 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,28 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an `InternalRow` with given data type. The returned accessor + * actually takes a `SpecializedGetters` input because it can be generalized to other classes + * that implements `SpecializedGetters` (e.g., `ArrayData`) too. + */ + def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = dataType match { --- End diff -- As `SpecializedGetters` is a java interface, seems we can't create a companion object for it? Or I miss something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180583831 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,28 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an `InternalRow` with given data type. The returned accessor + * actually takes a `SpecializedGetters` input because it can be generalized to other classes + * that implements `SpecializedGetters` (e.g., `ArrayData`) too. + */ + def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = dataType match { --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180583775 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -65,11 +65,19 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { checkEvaluationWithOptimization(expr, catalystValue, inputRow) } + + private def getActualDataType(dt: DataType): DataType = dt match { --- End diff -- I can use the method you added. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180475548 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,28 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an `InternalRow` with given data type. The returned accessor + * actually takes a `SpecializedGetters` input because it can be generalized to other classes + * that implements `SpecializedGetters` (e.g., `ArrayData`) too. + */ + def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = dataType match { --- End diff -- Perhaps we should move this to the companion object of `SpecializedGetters`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180468634 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -65,11 +65,19 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { checkEvaluationWithOptimization(expr, catalystValue, inputRow) } + + private def getActualDataType(dt: DataType): DataType = dt match { --- End diff -- I added a similar method to the `UserDefinedType` companion. Which one shall we add? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180423825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,26 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an InternalRow with given data type and ordinal. + */ + def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any = dataType match { --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180394552 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,26 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an InternalRow with given data type and ordinal. + */ + def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any = dataType match { --- End diff -- Well you could generalize to make it take an ordinal? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180356541 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,26 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an InternalRow with given data type and ordinal. + */ + def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any = dataType match { --- End diff -- Ah, no. The accessor in #20984 takes ordinal as input parameter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180356241 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,26 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an InternalRow with given data type and ordinal. + */ + def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any = dataType match { --- End diff -- Maybe we can make the accessor accept `SpecializedGetters`, so this can be reused in #20984 too... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180008583 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,25 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an InternalRow with given data type and ordinal. + */ + def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any = dataType match { +case BooleanType => (input) => input.getBoolean(ordinal) +case ByteType => (input) => input.getByte(ordinal) +case ShortType => (input) => input.getShort(ordinal) +case IntegerType | DateType => (input) => input.getInt(ordinal) +case LongType | TimestampType => (input) => input.getLong(ordinal) +case FloatType => (input) => input.getFloat(ordinal) +case DoubleType => (input) => input.getDouble(ordinal) +case StringType => (input) => input.getUTF8String(ordinal) +case BinaryType => (input) => input.getBinary(ordinal) +case CalendarIntervalType => (input) => input.getInterval(ordinal) +case t: DecimalType => (input) => input.getDecimal(ordinal, t.precision, t.scale) +case t: StructType => (input) => input.getStruct(ordinal, t.size) +case _: ArrayType => (input) => input.getArray(ordinal) +case _: MapType => (input) => input.getMap(ordinal) +case _ => (input) => input.get(ordinal, dataType) --- End diff -- Handle `UDT`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180008527 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -33,28 +33,14 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean) override def toString: String = s"input[$ordinal, ${dataType.simpleString}, $nullable]" + private lazy val accessor: InternalRow => Any = InternalRow.getAccessor(dataType, ordinal) --- End diff -- Do we need to be lazy? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r179420439 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala --- @@ -277,4 +278,31 @@ class ObjectExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(decodeUsingSerializer, null, InternalRow.fromSeq(Seq(null))) } } + + test("LambdaVariable should support interpreted execution") { +val elementTypes = Seq(BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, DecimalType.USER_DEFAULT, StringType, BinaryType, DateType, TimestampType, + CalendarIntervalType) +val arrayTypes = elementTypes.flatMap { elementType => + Seq(ArrayType(elementType, containsNull = false), ArrayType(elementType, containsNull = true)) +} +val mapTypes = elementTypes.flatMap { elementType => + Seq(MapType(elementType, elementType, false), MapType(elementType, elementType, true)) +} +val structTypes = elementTypes.flatMap { elementType => + Seq(StructType(StructField("col1", elementType, false) :: Nil), +StructType(StructField("col1", elementType, true) :: Nil)) +} + +val acceptedTypes = elementTypes ++ arrayTypes ++ mapTypes ++ structTypes +val random = new Random() --- End diff -- Please set a seed or use `withClue()` otherwise it will be very annoying to debug these when something goes south. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r179418102 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -550,11 +550,33 @@ case class LambdaVariable( dataType: DataType, nullable: Boolean = true) extends LeafExpression with NonSQLExpression { + private lazy val accessor: InternalRow => Any = dataType match { --- End diff -- We could make this a bit more generic and also use this for `BoundReference`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/20981 [SPARK-23873][SQL] Use accessors in interpreted LambdaVariable ## What changes were proposed in this pull request? Currently, interpreted execution of `LambdaVariable` just uses `InternalRow.get` to access element. We should use specified accessors if possible. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-23873 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20981.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20981 commit 35539be14bc79647409bfcdc5dac1fb99d00f5ec Author: Liang-Chi HsiehDate: 2018-04-05T03:57:17Z Use accessors in interpreted LambdaVariable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org