[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-11-15 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r234085471
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for value class that is top-level or array element,
+// if it is used as another type (e.g. as its parent trait or 
generic),
+// the compiler keeps the class so we must provide an instance of 
the
+// class too. In other cases, the compiler will handle 
wrapping/unwrapping
+// for us automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath, 
Some(t))
+val isCollectionElement = lastType.exists { lt =>
+  lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]]
--- End diff --

I added the support for Map


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-29 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r229156708
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -128,6 +128,15 @@ object ScalaReflection extends ScalaReflection {
 case _ => false
   }
 
+  def isValueClass(tpe: `Type`): Boolean = {
+tpe.typeSymbol.asClass.isDerivedValueClass
+  }
+
+  /** Returns the name and type of the underlying parameter of value class 
`tpe`. */
+  def getUnderlyingParameterOf(tpe: `Type`): (String, Type) = {
+getConstructorParameters(tpe).head
--- End diff --

not sure, I can't find any


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228925856
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -128,6 +128,15 @@ object ScalaReflection extends ScalaReflection {
 case _ => false
   }
 
+  def isValueClass(tpe: `Type`): Boolean = {
+tpe.typeSymbol.asClass.isDerivedValueClass
+  }
+
+  /** Returns the name and type of the underlying parameter of value class 
`tpe`. */
+  def getUnderlyingParameterOf(tpe: `Type`): (String, Type) = {
+getConstructorParameters(tpe).head
--- End diff --

is there a more official way to get the value class field name?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-29 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228879109
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -180,11 +189,13 @@ object ScalaReflection extends ScalaReflection {
* @param tpe The `Type` of deserialized object.
* @param path The expression which can be used to extract serialized 
value.
* @param walkedTypePath The paths from top to bottom to access current 
field when deserializing.
+   * @param instantiateValueClass If `true`, create an instance for Scala 
value class
--- End diff --

ok, I updated the comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228866822
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -180,11 +189,13 @@ object ScalaReflection extends ScalaReflection {
* @param tpe The `Type` of deserialized object.
* @param path The expression which can be used to extract serialized 
value.
* @param walkedTypePath The paths from top to bottom to access current 
field when deserializing.
+   * @param instantiateValueClass If `true`, create an instance for Scala 
value class
--- End diff --

it will be good to explain when we need to instantiate value class and why


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-29 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228863565
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for value class that is top-level or array element,
+// if it is used as another type (e.g. as its parent trait or 
generic),
+// the compiler keeps the class so we must provide an instance of 
the
+// class too. In other cases, the compiler will handle 
wrapping/unwrapping
+// for us automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath, 
Some(t))
+val isCollectionElement = lastType.exists { lt =>
+  lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]]
+}
+if (lastType.isEmpty || isCollectionElement) {
--- End diff --

Yea, I choose the `lastType` approach because it may be useful for other 
use cases also. But what you said is also true. I changed it to a boolean 
parameter


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228784274
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for value class that is top-level or array element,
+// if it is used as another type (e.g. as its parent trait or 
generic),
+// the compiler keeps the class so we must provide an instance of 
the
+// class too. In other cases, the compiler will handle 
wrapping/unwrapping
+// for us automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath, 
Some(t))
+val isCollectionElement = lastType.exists { lt =>
+  lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]]
+}
+if (lastType.isEmpty || isCollectionElement) {
--- End diff --

it looks to me that we don't need `lastType`, but just a boolean parameter 
"needInstantiateValueClass".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228783542
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for value class that is top-level or array element,
+// if it is used as another type (e.g. as its parent trait or 
generic),
+// the compiler keeps the class so we must provide an instance of 
the
+// class too. In other cases, the compiler will handle 
wrapping/unwrapping
+// for us automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath, 
Some(t))
+val isCollectionElement = lastType.exists { lt =>
+  lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]]
--- End diff --

how about map?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228783130
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -184,7 +193,8 @@ object ScalaReflection extends ScalaReflection {
   private def deserializerFor(
   tpe: `Type`,
   path: Expression,
-  walkedTypePath: Seq[String]): Expression = cleanUpReflectionObjects {
+  walkedTypePath: Seq[String],
+  lastType: Option[Type]): Expression = cleanUpReflectionObjects {
--- End diff --

can we add parameter doc for it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-28 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228741586
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -374,6 +383,29 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for value class that is top-level or array element,
+// if it is used as another type (e.g. as its parent trait or 
generic),
+// the compiler keeps the class so we must provide an instance of 
the
+// class too. In other cases, the compiler will handle 
wrapping/unwrapping
+// for us automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (walkedTypePath.length == 1 || 
walkedTypePath.head.contains("array element")) {
--- End diff --

never mind, I think going with option 2 is better in the future


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-28 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228737728
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -374,6 +383,29 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for value class that is top-level or array element,
+// if it is used as another type (e.g. as its parent trait or 
generic),
+// the compiler keeps the class so we must provide an instance of 
the
+// class too. In other cases, the compiler will handle 
wrapping/unwrapping
+// for us automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (walkedTypePath.length == 1 || 
walkedTypePath.head.contains("array element")) {
--- End diff --

@viirya nice catch! I didn't check for array case.
As you said, basically we just need to instantiate value class if it is the 
type of array element. However, to do that, in the `case` handling value class, 
I need to know that I'm checking the type of array element. I think there are 2 
ways:
- check the previous piece of `walkedTypePath` like I'm doing here
- introduce a parameter called `previousType: Type` for `deserializeFor`

Personally, I think what I did here (option 1) is a little bit hacky but it 
is minor and works well. Do you have any suggestion on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228734242
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
+  // as a field, value class is represented by its underlying type
+  val trueFieldType =
+if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) 
else fieldType
+
   val fieldValue = Invoke(
-AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(fieldType),
-returnNullable = !fieldType.typeSymbol.asClass.isPrimitive)
-  val clsName = getClassNameFromType(fieldType)
+AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(trueFieldType),
+returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive)
+  val clsName = getClassNameFromType(trueFieldType)
--- End diff --

Thanks for explaining! In order to cover both value class instantiated and 
not instantiated cases, I think we may need this special handling.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228734075
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

yes @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733929
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

So for Dataset, can we conclude that if a value class is top-level, it will 
be instantiated, but if it is nested field, it will be just underlying type?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733771
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

No in that case `Id` will be `Int`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733678
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

If the `T` is `User`, will `Id` class be instantiated?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

Oh, sorry I didn't see the identify case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733511
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

hmm, I think the first rule is that a value class is treated as another 
type. Isn't `T` just `Id`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733409
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

it is the 1st rule where `Id` is treated as a type parameter, you can see 
the `def identity[T]` example


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733287
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

> For example, in method encodeDecodeTest, if we pass an instance of Id as 
input, it will not be converted to Int.

Why in this case, `Id` won't be a Int? It seems to not be in the three 
rules.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733161
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -379,6 +388,28 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for top-level value class, if it is used as another 
type
+// (e.g. as its parent trait or generic), the compiler keeps the 
class
+// so we must provide an instance of the class too. In other cases,
+// the compiler will handle wrapping/unwrapping for us 
automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
--- End diff --

No, I haven't rebased since last week.
Hmm, now since `path` is not `Option` anymore, I think I have to use `if 
(walkedTypePath.length > 1)` to have the same logic. But this seems a little 
bit hacky. Do you have any suggestion on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733107
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

yes, but there are some cases where an instance of value class must be 
created. You can see this [allocation 
rule](https://docs.scala-lang.org/overviews/core/value-classes.html#allocation-details).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733087
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
 ---
@@ -297,11 +307,16 @@ class ExpressionEncoderSuite extends 
CodegenInterpretedPlanTest with AnalysisTes
 ExpressionEncoder.tuple(intEnc, ExpressionEncoder.tuple(intEnc, 
longEnc))
   }
 
+  // test for Scala value class
   encodeDecodeTest(
 PrimitiveValueClass(42), "primitive value class")
-
   encodeDecodeTest(
 ReferenceValueClass(ReferenceValueClass.Container(1)), "reference 
value class")
+  encodeDecodeTest(StringWrapper("a"), "value class string")
+  encodeDecodeTest(ValueContainer(1, StringWrapper("b")), "value class 
nested")
+  encodeDecodeTest(
--- End diff --

sure, I added a test with `StringWrapper(null)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228732860
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

So after Scala compile, value class will be its underlying type?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228732805
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

@viirya we cannot because after compile, field `id: Id` becomes `id: Int`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228731507
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

I have the same question. When we deserialize a value class, don't we 
always have `NewInstance` to construct it back? When constructing `User`, can't 
we pass in a `Id`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228731280
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -358,4 +368,20 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(numberOfCheckedArguments(deserializerFor[(java.lang.Double, 
Int)]) == 1)
 assert(numberOfCheckedArguments(deserializerFor[(java.lang.Integer, 
java.lang.Integer)]) == 0)
   }
+
+  test("schema for case class that is a value class") {
+val schema = schemaFor[TestingValueClass.IntWrapper]
+assert(schema === Schema(IntegerType, nullable = false))
+  }
+
+  test("schema for case class that contains value class fields") {
+val schema = schemaFor[TestingValueClass.ValueClassData]
+assert(schema === Schema(
+  StructType(Seq(
+StructField("intField", IntegerType, nullable = false),
+StructField("wrappedInt", IntegerType, nullable = false),
--- End diff --

yup, value class in general cannot be null since it is a subtype of AnyVal


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228731247
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -379,6 +388,28 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for top-level value class, if it is used as another 
type
+// (e.g. as its parent trait or generic), the compiler keeps the 
class
+// so we must provide an instance of the class too. In other cases,
+// the compiler will handle wrapping/unwrapping for us 
automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
--- End diff --

did you rebase? I think `path` is not `Option` anymore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228731209
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
 ---
@@ -297,11 +307,16 @@ class ExpressionEncoderSuite extends 
CodegenInterpretedPlanTest with AnalysisTes
 ExpressionEncoder.tuple(intEnc, ExpressionEncoder.tuple(intEnc, 
longEnc))
   }
 
+  // test for Scala value class
   encodeDecodeTest(
 PrimitiveValueClass(42), "primitive value class")
-
   encodeDecodeTest(
 ReferenceValueClass(ReferenceValueClass.Container(1)), "reference 
value class")
+  encodeDecodeTest(StringWrapper("a"), "value class string")
+  encodeDecodeTest(ValueContainer(1, StringWrapper("b")), "value class 
nested")
+  encodeDecodeTest(
--- End diff --

can we also test with null values?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228730753
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -358,4 +368,20 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(numberOfCheckedArguments(deserializerFor[(java.lang.Double, 
Int)]) == 1)
 assert(numberOfCheckedArguments(deserializerFor[(java.lang.Integer, 
java.lang.Integer)]) == 0)
   }
+
+  test("schema for case class that is a value class") {
+val schema = schemaFor[TestingValueClass.IntWrapper]
+assert(schema === Schema(IntegerType, nullable = false))
+  }
+
+  test("schema for case class that contains value class fields") {
+val schema = schemaFor[TestingValueClass.ValueClassData]
+assert(schema === Schema(
+  StructType(Seq(
+StructField("intField", IntegerType, nullable = false),
+StructField("wrappedInt", IntegerType, nullable = false),
--- End diff --

to confirm, scala value class for primitive type can't be null?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228716352
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
+  // as a field, value class is represented by its underlying type
+  val trueFieldType =
+if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) 
else fieldType
+
   val fieldValue = Invoke(
-AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(fieldType),
-returnNullable = !fieldType.typeSymbol.asClass.isPrimitive)
-  val clsName = getClassNameFromType(fieldType)
+AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(trueFieldType),
+returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive)
+  val clsName = getClassNameFromType(trueFieldType)
--- End diff --

I tried moving the special logic to the value class case but have a concern 
I don't know how to resolve yet. I need to change `dataTypeFor` to return 
`ObjectType` for top level value class and `dataTypeFor(underlyingType)` 
otherwise (see my 
[comment](https://github.com/apache/spark/pull/22309#discussion_r226142827)). 
I'm going with something like this:
```scala
private def dataTypeFor(tpe: `Type`, isTopLevelValueClass: Boolean = true)
```
but this isn't right because:
- the default value `true` doesn't make sense for other types
- if default is `false` or there is no default value, many places that call 
this method need to be changed
- it also feels clunky because `dataTypeFor` now has to be aware of the 
context of its parameter

Do you have any suggestion on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228713038
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -622,6 +654,14 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "serialize", udt, inputObject :: Nil)
 
+  case t if isValueClass(t) =>
+val (name, underlyingType) = getConstructorParameters(t).head
--- End diff --

you're right, I should return both name and type from `getUnderlyingTypeOf`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228713022
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -128,6 +128,16 @@ object ScalaReflection extends ScalaReflection {
 case _ => false
   }
 
+  def isValueClass(tpe: `Type`): Boolean = {
+tpe.typeSymbol.asClass.isDerivedValueClass
+  }
+
+  /** Returns the underlying type of value class `cls`. */
+  def getUnderlyingTypeOf(cls: `Type`): `Type` = {
--- End diff --

agree, it's better to use `tpe`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r227640284
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
+  // as a field, value class is represented by its underlying type
+  val trueFieldType =
+if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) 
else fieldType
+
   val fieldValue = Invoke(
-AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(fieldType),
-returnNullable = !fieldType.typeSymbol.asClass.isPrimitive)
-  val clsName = getClassNameFromType(fieldType)
+AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(trueFieldType),
+returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive)
+  val clsName = getClassNameFromType(trueFieldType)
--- End diff --

Why we need such special handling? There is new serialization handling for 
value class added above, can't we simple get the object of value class here and 
let recursively call of `serializerFor` to handle it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r227638126
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -128,6 +128,16 @@ object ScalaReflection extends ScalaReflection {
 case _ => false
   }
 
+  def isValueClass(tpe: `Type`): Boolean = {
+tpe.typeSymbol.asClass.isDerivedValueClass
+  }
+
+  /** Returns the underlying type of value class `cls`. */
+  def getUnderlyingTypeOf(cls: `Type`): `Type` = {
--- End diff --

nit: it is usually to name it as `tpe` for `Type` in ScalaReflection.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r227638782
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -622,6 +654,14 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "serialize", udt, inputObject :: Nil)
 
+  case t if isValueClass(t) =>
+val (name, underlyingType) = getConstructorParameters(t).head
--- End diff --

Can we use `getUnderlyingTypeOf` consistently? Let `getUnderlyingTypeOf` 
return both parameter name and type.

Or just use `getConstructorParameters` and get rid of `getUnderlyingTypeOf`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-18 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226510002
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -632,13 +667,17 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
+  // as a field, value class is represented by its underlying type
+  val trueFieldType =
+if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) 
else fieldType
--- End diff --

update `dataTypeFor` is not enough since `trueFieldType` is used in code 
below as well. We can move this into `getConstructorParameters` but then this 
special handling is still there, so it will not be cleaner.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-18 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226509492
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
--- End diff --

Here what I mean by "treated as a product" is we must create a new instance 
for it in case we have `Dataset[Id]` for example (as in my 
[comment](https://github.com/apache/spark/pull/22309#discussion_r226142827) 
above). Seems like this is confusing? Should I reword it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-18 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226508827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
--- End diff --

Sure, here is the message when running with value class `Id` above
`- Scala value class: org.apache.spark.sql.catalyst.encoders.Id(scala.Int)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226149168
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -632,13 +667,17 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
+  // as a field, value class is represented by its underlying type
+  val trueFieldType =
+if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) 
else fieldType
--- End diff --

shall we update `dataTypeFor` to handle value class, instead of special 
handling it here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226148985
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
--- End diff --

do you have an example of this message?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226148892
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
--- End diff --

This is not true, top level value class can be a single primitive type 
column


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226144119
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -125,6 +125,17 @@ object ScalaReflection extends ScalaReflection {
 case _ => false
   }
 
+  def isValueClass(tpe: `Type`): Boolean = {
+val notNull = !(tpe <:< localTypeOf[Null])
+notNull && definedByConstructorParams(tpe) && tpe <:< 
localTypeOf[AnyVal]
--- End diff --

No, now you ask this, I realize that the `Product` constraint can be 
completely removed.

Also, after scanning through the type api, I found a [built-in 
way](isDerivedValueClass) to check for value class... Don't know why I never 
thought about this 🤦‍♂️


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226142977
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
--- End diff --

I cannot use that because it returns the class name after erasure (e.g. it 
returns `Int` for `IntWrapper`). I'll create a separate method to make this 
clearer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226142827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

Take class `User` above for example. After compile, field id of type `Id` 
will become `Int` so when constructing `User` we need `id` to be `Int`.

Also why we need `NewInstance` in case `Id` is itself the schema? Because 
`Id` may remain as `Id` if it is treated as another type (following [allocation 
rule](https://docs.scala-lang.org/overviews/core/value-classes.html#allocation-details)).
 For example, in method 
[encodeDecodeTest](https://github.com/apache/spark/blob/a40806d2bd84e9a0308165f0d6c97e9cf00aa4a3/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala#L373),
 if we pass an instance of `Id` as input, it will not be converted to `Int`. In 
the other case when the required type is explicitly `Id`, then both the input 
and the result returned from deserialization will both become `Int`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r225030497
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -125,6 +125,17 @@ object ScalaReflection extends ScalaReflection {
 case _ => false
   }
 
+  def isValueClass(tpe: `Type`): Boolean = {
+val notNull = !(tpe <:< localTypeOf[Null])
+notNull && definedByConstructorParams(tpe) && tpe <:< 
localTypeOf[AnyVal]
--- End diff --

why a value class must be a `Product`? Is it because there is no way to get 
the value class field name and type?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r225030384
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
--- End diff --

how about `getClassNameFromType(t)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r225030352
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

why can we skip the `NewInstance`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-13 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r224951179
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -108,6 +108,16 @@ object TestingUDT {
   }
 }
 
+object TestingValueClass {
+  case class IntWrapper(i: Int) extends AnyVal
+  case class StrWrapper(s: String) extends AnyVal
+
+  case class ValueClassData(
+intField: Int,
+wrappedInt: IntWrapper,
+strField: String,
+wrappedStr: StrWrapper)
--- End diff --

Right, I added some comments for this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-12 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r224944923
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -108,6 +108,16 @@ object TestingUDT {
   }
 }
 
+object TestingValueClass {
+  case class IntWrapper(i: Int) extends AnyVal
--- End diff --

It doesn't but since Spark only supports `case class` (not `class`) for 
schema type. So I keep it that way.
Child columns can be `class` though. I think adding that in the future on 
top of this is not difficult.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-10 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r224318955
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -108,6 +108,16 @@ object TestingUDT {
   }
 }
 
+object TestingValueClass {
+  case class IntWrapper(i: Int) extends AnyVal
+  case class StrWrapper(s: String) extends AnyVal
+
+  case class ValueClassData(
+intField: Int,
+wrappedInt: IntWrapper,
+strField: String,
+wrappedStr: StrWrapper)
--- End diff --

We might need a comment to describe what this class is look like in Java.
Seems like it has 2 int fields `intField`, `wrappedInt`, and 2 string 
fields `strField`, `wrappedStr`. I'm not sure it is the same in Scala 2.12, 
though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r224300113
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -108,6 +108,16 @@ object TestingUDT {
   }
 }
 
+object TestingValueClass {
+  case class IntWrapper(i: Int) extends AnyVal
--- End diff --

does value class must be a case class?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org