GitHub user michalsenkyr opened a pull request:
https://github.com/apache/spark/pull/20505
[SPARK-23251][SQL] Add checks for collection element Encoders
Implicit methods of `SQLImplicits` providing Encoders for collections did
not check for
Encoders for their elements. This resulted in unrelated error messages
during run-time instead of proper failures during compilation.
## What changes were proposed in this pull request?
`Seq`, `Map` and `Set` `Encoder` providers' type parameters and implicit
parameters were modified to facilitate the appropriate checks.
Unfortunately, this doesn't result in the desired "Unable to find encoder
for type stored in a Dataset" error as the Scala compiler generates a different
error when an implicit cannot be found due to missing arguments (see
[ContextErrors](https://github.com/scala/scala/blob/v2.11.12/src/compiler/scala/tools/nsc/typechecker/ContextErrors.scala#L77)).
`@implicitNotFound` is not used in this case.
## How was this patch tested?
New behavior:
```scala
scala> implicitly[Encoder[Map[String, Any]]]
<console>:25: error: diverging implicit expansion for type
org.apache.spark.sql.Encoder[Map[String,Any]]
starting with method newStringEncoder in class SQLImplicits
implicitly[Encoder[Map[String, Any]]]
^
```
Old behavior:
```scala
scala> implicitly[Encoder[Map[String, Any]]]
java.lang.ClassNotFoundException: scala.Any
at
scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at
scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:555)
at
scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1211)
at
scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1203)
at
scala.reflect.runtime.TwoWayCaches$TwoWayCache$$anonfun$toJava$1.apply(TwoWayCaches.scala:49)
at scala.reflect.runtime.Gil$class.gilSynchronized(Gil.scala:19)
at
scala.reflect.runtime.JavaUniverse.gilSynchronized(JavaUniverse.scala:16)
at
scala.reflect.runtime.TwoWayCaches$TwoWayCache.toJava(TwoWayCaches.scala:44)
at
scala.reflect.runtime.JavaMirrors$JavaMirror.classToJava(JavaMirrors.scala:1203)
at
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:194)
at
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:54)
at
org.apache.spark.sql.catalyst.ScalaReflection$.getClassFromType(ScalaReflection.scala:700)
at
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:84)
at
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:65)
at
scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at
org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
at
org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at
org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor(ScalaReflection.scala:64)
at
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:512)
at
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:445)
at
scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at
org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
at
org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at
org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:445)
at
org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:434)
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
at org.apache.spark.sql.SQLImplicits.newMapEncoder(SQLImplicits.scala:172)
... 49 elided
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/michalsenkyr/spark SPARK-23251
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20505.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20505
----
commit e3d6dbfdf6af274090e3ba6cc5795745600d26c1
Author: Michal Senkyr <mike.senkyr@...>
Date: 2018-02-04T22:08:25Z
Add checks for collection element Encoders
Implicit methods providing Encoders for collections did not check for
element Encoders, making related operations fail during run-time.
commit d2167607874c574961c490f188ca5c618a6864be
Author: Michal Senkyr <mike.senkyr@...>
Date: 2018-02-04T23:15:31Z
Add unit tests
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]