[GitHub] spark pull request #20505: [SPARK-23251][SQL] Add checks for collection elem...

michalsenkyr Sun, 04 Feb 2018 15:18:07 -0800

GitHub user michalsenkyr opened a pull request:

    https://github.com/apache/spark/pull/20505


    [SPARK-23251][SQL] Add checks for collection element Encoders

    Implicit methods of `SQLImplicits` providing Encoders for collections did 
not check for
    Encoders for their elements. This resulted in unrelated error messages 
during run-time instead of proper failures during compilation.
    
    ## What changes were proposed in this pull request?
    
    `Seq`, `Map` and `Set` `Encoder` providers' type parameters and implicit 
parameters were modified to facilitate the appropriate checks.
    
    Unfortunately, this doesn't result in the desired "Unable to find encoder 
for type stored in a Dataset" error as the Scala compiler generates a different 
error when an implicit cannot be found due to missing arguments (see 
[ContextErrors](https://github.com/scala/scala/blob/v2.11.12/src/compiler/scala/tools/nsc/typechecker/ContextErrors.scala#L77)).
 `@implicitNotFound` is not used in this case.
    
    ## How was this patch tested?
    
    New behavior:
    ```scala
    scala> implicitly[Encoder[Map[String, Any]]]
    <console>:25: error: diverging implicit expansion for type 
org.apache.spark.sql.Encoder[Map[String,Any]]
    starting with method newStringEncoder in class SQLImplicits
           implicitly[Encoder[Map[String, Any]]]
                     ^
    ```
    
    Old behavior:
    ```scala
    scala> implicitly[Encoder[Map[String, Any]]]
    java.lang.ClassNotFoundException: scala.Any
      at 
scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      at java.lang.Class.forName0(Native Method)
      at java.lang.Class.forName(Class.java:348)
      at 
scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:555)
      at 
scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1211)
      at 
scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1203)
      at 
scala.reflect.runtime.TwoWayCaches$TwoWayCache$$anonfun$toJava$1.apply(TwoWayCaches.scala:49)
      at scala.reflect.runtime.Gil$class.gilSynchronized(Gil.scala:19)
      at 
scala.reflect.runtime.JavaUniverse.gilSynchronized(JavaUniverse.scala:16)
      at 
scala.reflect.runtime.TwoWayCaches$TwoWayCache.toJava(TwoWayCaches.scala:44)
      at 
scala.reflect.runtime.JavaMirrors$JavaMirror.classToJava(JavaMirrors.scala:1203)
      at 
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:194)
      at 
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:54)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$.getClassFromType(ScalaReflection.scala:700)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:84)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:65)
      at 
scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor(ScalaReflection.scala:64)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:512)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:445)
      at 
scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:445)
      at 
org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:434)
      at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
      at org.apache.spark.sql.SQLImplicits.newMapEncoder(SQLImplicits.scala:172)
      ... 49 elided
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/michalsenkyr/spark SPARK-23251

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20505.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20505
    
----
commit e3d6dbfdf6af274090e3ba6cc5795745600d26c1
Author: Michal Senkyr <mike.senkyr@...>
Date:   2018-02-04T22:08:25Z

    Add checks for collection element Encoders
    
    Implicit methods providing Encoders for collections did not check for
    element Encoders, making related operations fail during run-time.

commit d2167607874c574961c490f188ca5c618a6864be
Author: Michal Senkyr <mike.senkyr@...>
Date:   2018-02-04T23:15:31Z

    Add unit tests

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20505: [SPARK-23251][SQL] Add checks for collection elem...

Reply via email to