[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r234085471 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for value class that is top-level or array element, +// if it is used as another type (e.g. as its parent trait or generic), +// the compiler keeps the class so we must provide an instance of the +// class too. In other cases, the compiler will handle wrapping/unwrapping +// for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath, Some(t)) +val isCollectionElement = lastType.exists { lt => + lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]] --- End diff -- I added the support for Map --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r229156708 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -128,6 +128,15 @@ object ScalaReflection extends ScalaReflection { case _ => false } + def isValueClass(tpe: `Type`): Boolean = { +tpe.typeSymbol.asClass.isDerivedValueClass + } + + /** Returns the name and type of the underlying parameter of value class `tpe`. */ + def getUnderlyingParameterOf(tpe: `Type`): (String, Type) = { +getConstructorParameters(tpe).head --- End diff -- not sure, I can't find any --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228925856 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -128,6 +128,15 @@ object ScalaReflection extends ScalaReflection { case _ => false } + def isValueClass(tpe: `Type`): Boolean = { +tpe.typeSymbol.asClass.isDerivedValueClass + } + + /** Returns the name and type of the underlying parameter of value class `tpe`. */ + def getUnderlyingParameterOf(tpe: `Type`): (String, Type) = { +getConstructorParameters(tpe).head --- End diff -- is there a more official way to get the value class field name? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228879109 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -180,11 +189,13 @@ object ScalaReflection extends ScalaReflection { * @param tpe The `Type` of deserialized object. * @param path The expression which can be used to extract serialized value. * @param walkedTypePath The paths from top to bottom to access current field when deserializing. + * @param instantiateValueClass If `true`, create an instance for Scala value class --- End diff -- ok, I updated the comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228866822 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -180,11 +189,13 @@ object ScalaReflection extends ScalaReflection { * @param tpe The `Type` of deserialized object. * @param path The expression which can be used to extract serialized value. * @param walkedTypePath The paths from top to bottom to access current field when deserializing. + * @param instantiateValueClass If `true`, create an instance for Scala value class --- End diff -- it will be good to explain when we need to instantiate value class and why --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228863565 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for value class that is top-level or array element, +// if it is used as another type (e.g. as its parent trait or generic), +// the compiler keeps the class so we must provide an instance of the +// class too. In other cases, the compiler will handle wrapping/unwrapping +// for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath, Some(t)) +val isCollectionElement = lastType.exists { lt => + lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]] +} +if (lastType.isEmpty || isCollectionElement) { --- End diff -- Yea, I choose the `lastType` approach because it may be useful for other use cases also. But what you said is also true. I changed it to a boolean parameter --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228784274 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for value class that is top-level or array element, +// if it is used as another type (e.g. as its parent trait or generic), +// the compiler keeps the class so we must provide an instance of the +// class too. In other cases, the compiler will handle wrapping/unwrapping +// for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath, Some(t)) +val isCollectionElement = lastType.exists { lt => + lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]] +} +if (lastType.isEmpty || isCollectionElement) { --- End diff -- it looks to me that we don't need `lastType`, but just a boolean parameter "needInstantiateValueClass". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228783542 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for value class that is top-level or array element, +// if it is used as another type (e.g. as its parent trait or generic), +// the compiler keeps the class so we must provide an instance of the +// class too. In other cases, the compiler will handle wrapping/unwrapping +// for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath, Some(t)) +val isCollectionElement = lastType.exists { lt => + lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]] --- End diff -- how about map? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228783130 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -184,7 +193,8 @@ object ScalaReflection extends ScalaReflection { private def deserializerFor( tpe: `Type`, path: Expression, - walkedTypePath: Seq[String]): Expression = cleanUpReflectionObjects { + walkedTypePath: Seq[String], + lastType: Option[Type]): Expression = cleanUpReflectionObjects { --- End diff -- can we add parameter doc for it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228741586 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -374,6 +383,29 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for value class that is top-level or array element, +// if it is used as another type (e.g. as its parent trait or generic), +// the compiler keeps the class so we must provide an instance of the +// class too. In other cases, the compiler will handle wrapping/unwrapping +// for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath) +if (walkedTypePath.length == 1 || walkedTypePath.head.contains("array element")) { --- End diff -- never mind, I think going with option 2 is better in the future --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228737728 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -374,6 +383,29 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for value class that is top-level or array element, +// if it is used as another type (e.g. as its parent trait or generic), +// the compiler keeps the class so we must provide an instance of the +// class too. In other cases, the compiler will handle wrapping/unwrapping +// for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath) +if (walkedTypePath.length == 1 || walkedTypePath.head.contains("array element")) { --- End diff -- @viirya nice catch! I didn't check for array case. As you said, basically we just need to instantiate value class if it is the type of array element. However, to do that, in the `case` handling value class, I need to know that I'm checking the type of array element. I think there are 2 ways: - check the previous piece of `walkedTypePath` like I'm doing here - introduce a parameter called `previousType: Type` for `deserializeFor` Personally, I think what I did here (option 1) is a little bit hacky but it is minor and works well. Do you have any suggestion on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228734242 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } + // as a field, value class is represented by its underlying type + val trueFieldType = +if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) else fieldType + val fieldValue = Invoke( -AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(fieldType), -returnNullable = !fieldType.typeSymbol.asClass.isPrimitive) - val clsName = getClassNameFromType(fieldType) +AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(trueFieldType), +returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive) + val clsName = getClassNameFromType(trueFieldType) --- End diff -- Thanks for explaining! In order to cover both value class instantiated and not instantiated cases, I think we may need this special handling. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228734075 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- yes @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733929 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- So for Dataset, can we conclude that if a value class is top-level, it will be instantiated, but if it is nested field, it will be just underlying type? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733771 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- No in that case `Id` will be `Int` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733678 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- If the `T` is `User`, will `Id` class be instantiated? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733525 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- Oh, sorry I didn't see the identify case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733511 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- hmm, I think the first rule is that a value class is treated as another type. Isn't `T` just `Id`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733409 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- it is the 1st rule where `Id` is treated as a type parameter, you can see the `def identity[T]` example --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733287 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- > For example, in method encodeDecodeTest, if we pass an instance of Id as input, it will not be converted to Int. Why in this case, `Id` won't be a Int? It seems to not be in the three rules. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733161 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -379,6 +388,28 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for top-level value class, if it is used as another type +// (e.g. as its parent trait or generic), the compiler keeps the class +// so we must provide an instance of the class too. In other cases, +// the compiler will handle wrapping/unwrapping for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { --- End diff -- No, I haven't rebased since last week. Hmm, now since `path` is not `Option` anymore, I think I have to use `if (walkedTypePath.length > 1)` to have the same logic. But this seems a little bit hacky. Do you have any suggestion on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733107 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- yes, but there are some cases where an instance of value class must be created. You can see this [allocation rule](https://docs.scala-lang.org/overviews/core/value-classes.html#allocation-details). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733087 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala --- @@ -297,11 +307,16 @@ class ExpressionEncoderSuite extends CodegenInterpretedPlanTest with AnalysisTes ExpressionEncoder.tuple(intEnc, ExpressionEncoder.tuple(intEnc, longEnc)) } + // test for Scala value class encodeDecodeTest( PrimitiveValueClass(42), "primitive value class") - encodeDecodeTest( ReferenceValueClass(ReferenceValueClass.Container(1)), "reference value class") + encodeDecodeTest(StringWrapper("a"), "value class string") + encodeDecodeTest(ValueContainer(1, StringWrapper("b")), "value class nested") + encodeDecodeTest( --- End diff -- sure, I added a test with `StringWrapper(null)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228732860 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- So after Scala compile, value class will be its underlying type? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228732805 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- @viirya we cannot because after compile, field `id: Id` becomes `id: Int` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228731507 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- I have the same question. When we deserialize a value class, don't we always have `NewInstance` to construct it back? When constructing `User`, can't we pass in a `Id`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228731280 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -358,4 +368,20 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(numberOfCheckedArguments(deserializerFor[(java.lang.Double, Int)]) == 1) assert(numberOfCheckedArguments(deserializerFor[(java.lang.Integer, java.lang.Integer)]) == 0) } + + test("schema for case class that is a value class") { +val schema = schemaFor[TestingValueClass.IntWrapper] +assert(schema === Schema(IntegerType, nullable = false)) + } + + test("schema for case class that contains value class fields") { +val schema = schemaFor[TestingValueClass.ValueClassData] +assert(schema === Schema( + StructType(Seq( +StructField("intField", IntegerType, nullable = false), +StructField("wrappedInt", IntegerType, nullable = false), --- End diff -- yup, value class in general cannot be null since it is a subtype of AnyVal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228731247 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -379,6 +388,28 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for top-level value class, if it is used as another type +// (e.g. as its parent trait or generic), the compiler keeps the class +// so we must provide an instance of the class too. In other cases, +// the compiler will handle wrapping/unwrapping for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { --- End diff -- did you rebase? I think `path` is not `Option` anymore. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228731209 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala --- @@ -297,11 +307,16 @@ class ExpressionEncoderSuite extends CodegenInterpretedPlanTest with AnalysisTes ExpressionEncoder.tuple(intEnc, ExpressionEncoder.tuple(intEnc, longEnc)) } + // test for Scala value class encodeDecodeTest( PrimitiveValueClass(42), "primitive value class") - encodeDecodeTest( ReferenceValueClass(ReferenceValueClass.Container(1)), "reference value class") + encodeDecodeTest(StringWrapper("a"), "value class string") + encodeDecodeTest(ValueContainer(1, StringWrapper("b")), "value class nested") + encodeDecodeTest( --- End diff -- can we also test with null values? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228730753 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -358,4 +368,20 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(numberOfCheckedArguments(deserializerFor[(java.lang.Double, Int)]) == 1) assert(numberOfCheckedArguments(deserializerFor[(java.lang.Integer, java.lang.Integer)]) == 0) } + + test("schema for case class that is a value class") { +val schema = schemaFor[TestingValueClass.IntWrapper] +assert(schema === Schema(IntegerType, nullable = false)) + } + + test("schema for case class that contains value class fields") { +val schema = schemaFor[TestingValueClass.ValueClassData] +assert(schema === Schema( + StructType(Seq( +StructField("intField", IntegerType, nullable = false), +StructField("wrappedInt", IntegerType, nullable = false), --- End diff -- to confirm, scala value class for primitive type can't be null? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228716352 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } + // as a field, value class is represented by its underlying type + val trueFieldType = +if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) else fieldType + val fieldValue = Invoke( -AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(fieldType), -returnNullable = !fieldType.typeSymbol.asClass.isPrimitive) - val clsName = getClassNameFromType(fieldType) +AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(trueFieldType), +returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive) + val clsName = getClassNameFromType(trueFieldType) --- End diff -- I tried moving the special logic to the value class case but have a concern I don't know how to resolve yet. I need to change `dataTypeFor` to return `ObjectType` for top level value class and `dataTypeFor(underlyingType)` otherwise (see my [comment](https://github.com/apache/spark/pull/22309#discussion_r226142827)). I'm going with something like this: ```scala private def dataTypeFor(tpe: `Type`, isTopLevelValueClass: Boolean = true) ``` but this isn't right because: - the default value `true` doesn't make sense for other types - if default is `false` or there is no default value, many places that call this method need to be changed - it also feels clunky because `dataTypeFor` now has to be aware of the context of its parameter Do you have any suggestion on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228713038 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -622,6 +654,14 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "serialize", udt, inputObject :: Nil) + case t if isValueClass(t) => +val (name, underlyingType) = getConstructorParameters(t).head --- End diff -- you're right, I should return both name and type from `getUnderlyingTypeOf` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228713022 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -128,6 +128,16 @@ object ScalaReflection extends ScalaReflection { case _ => false } + def isValueClass(tpe: `Type`): Boolean = { +tpe.typeSymbol.asClass.isDerivedValueClass + } + + /** Returns the underlying type of value class `cls`. */ + def getUnderlyingTypeOf(cls: `Type`): `Type` = { --- End diff -- agree, it's better to use `tpe` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r227640284 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } + // as a field, value class is represented by its underlying type + val trueFieldType = +if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) else fieldType + val fieldValue = Invoke( -AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(fieldType), -returnNullable = !fieldType.typeSymbol.asClass.isPrimitive) - val clsName = getClassNameFromType(fieldType) +AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(trueFieldType), +returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive) + val clsName = getClassNameFromType(trueFieldType) --- End diff -- Why we need such special handling? There is new serialization handling for value class added above, can't we simple get the object of value class here and let recursively call of `serializerFor` to handle it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r227638126 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -128,6 +128,16 @@ object ScalaReflection extends ScalaReflection { case _ => false } + def isValueClass(tpe: `Type`): Boolean = { +tpe.typeSymbol.asClass.isDerivedValueClass + } + + /** Returns the underlying type of value class `cls`. */ + def getUnderlyingTypeOf(cls: `Type`): `Type` = { --- End diff -- nit: it is usually to name it as `tpe` for `Type` in ScalaReflection. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r227638782 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -622,6 +654,14 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "serialize", udt, inputObject :: Nil) + case t if isValueClass(t) => +val (name, underlyingType) = getConstructorParameters(t).head --- End diff -- Can we use `getUnderlyingTypeOf` consistently? Let `getUnderlyingTypeOf` return both parameter name and type. Or just use `getConstructorParameters` and get rid of `getUnderlyingTypeOf`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226510002 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -632,13 +667,17 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } + // as a field, value class is represented by its underlying type + val trueFieldType = +if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) else fieldType --- End diff -- update `dataTypeFor` is not enough since `trueFieldType` is used in code below as well. We can move this into `getConstructorParameters` but then this special handling is still there, so it will not be cleaner. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226509492 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product --- End diff -- Here what I mean by "treated as a product" is we must create a new instance for it in case we have `Dataset[Id]` for example (as in my [comment](https://github.com/apache/spark/pull/22309#discussion_r226142827) above). Seems like this is confusing? Should I reword it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226508827 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: --- End diff -- Sure, here is the message when running with value class `Id` above `- Scala value class: org.apache.spark.sql.catalyst.encoders.Id(scala.Int)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226149168 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -632,13 +667,17 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } + // as a field, value class is represented by its underlying type + val trueFieldType = +if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) else fieldType --- End diff -- shall we update `dataTypeFor` to handle value class, instead of special handling it here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226148985 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: --- End diff -- do you have an example of this message? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226148892 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product --- End diff -- This is not true, top level value class can be a single primitive type column --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226144119 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -125,6 +125,17 @@ object ScalaReflection extends ScalaReflection { case _ => false } + def isValueClass(tpe: `Type`): Boolean = { +val notNull = !(tpe <:< localTypeOf[Null]) +notNull && definedByConstructorParams(tpe) && tpe <:< localTypeOf[AnyVal] --- End diff -- No, now you ask this, I realize that the `Product` constraint can be completely removed. Also, after scanning through the type api, I found a [built-in way](isDerivedValueClass) to check for value class... Don't know why I never thought about this ð¤¦ââï¸ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226142977 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName --- End diff -- I cannot use that because it returns the class name after erasure (e.g. it returns `Int` for `IntWrapper`). I'll create a separate method to make this clearer. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226142827 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- Take class `User` above for example. After compile, field id of type `Id` will become `Int` so when constructing `User` we need `id` to be `Int`. Also why we need `NewInstance` in case `Id` is itself the schema? Because `Id` may remain as `Id` if it is treated as another type (following [allocation rule](https://docs.scala-lang.org/overviews/core/value-classes.html#allocation-details)). For example, in method [encodeDecodeTest](https://github.com/apache/spark/blob/a40806d2bd84e9a0308165f0d6c97e9cf00aa4a3/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala#L373), if we pass an instance of `Id` as input, it will not be converted to `Int`. In the other case when the required type is explicitly `Id`, then both the input and the result returned from deserialization will both become `Int`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r225030497 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -125,6 +125,17 @@ object ScalaReflection extends ScalaReflection { case _ => false } + def isValueClass(tpe: `Type`): Boolean = { +val notNull = !(tpe <:< localTypeOf[Null]) +notNull && definedByConstructorParams(tpe) && tpe <:< localTypeOf[AnyVal] --- End diff -- why a value class must be a `Product`? Is it because there is no way to get the value class field name and type? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r225030384 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName --- End diff -- how about `getClassNameFromType(t)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r225030352 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- why can we skip the `NewInstance`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r224951179 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -108,6 +108,16 @@ object TestingUDT { } } +object TestingValueClass { + case class IntWrapper(i: Int) extends AnyVal + case class StrWrapper(s: String) extends AnyVal + + case class ValueClassData( +intField: Int, +wrappedInt: IntWrapper, +strField: String, +wrappedStr: StrWrapper) --- End diff -- Right, I added some comments for this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r224944923 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -108,6 +108,16 @@ object TestingUDT { } } +object TestingValueClass { + case class IntWrapper(i: Int) extends AnyVal --- End diff -- It doesn't but since Spark only supports `case class` (not `class`) for schema type. So I keep it that way. Child columns can be `class` though. I think adding that in the future on top of this is not difficult. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r224318955 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -108,6 +108,16 @@ object TestingUDT { } } +object TestingValueClass { + case class IntWrapper(i: Int) extends AnyVal + case class StrWrapper(s: String) extends AnyVal + + case class ValueClassData( +intField: Int, +wrappedInt: IntWrapper, +strField: String, +wrappedStr: StrWrapper) --- End diff -- We might need a comment to describe what this class is look like in Java. Seems like it has 2 int fields `intField`, `wrappedInt`, and 2 string fields `strField`, `wrappedStr`. I'm not sure it is the same in Scala 2.12, though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r224300113 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -108,6 +108,16 @@ object TestingUDT { } } +object TestingValueClass { + case class IntWrapper(i: Int) extends AnyVal --- End diff -- does value class must be a case class? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org