Github user jkleckner commented on the pull request:
https://github.com/apache/spark/pull/4780#issuecomment-76831730
> I wonder if we're hitting some bugs with the serializer's classloader not
seeing the right classloaders. Some issues like that are fixed in 1.3.0. It may
still be that this is OK in 1.3, but that's just a guess right now.
>
> Remind me where the error occurs? is it within a stack trace that
includes the serializer?
Perhaps. Here is a bit of test code that I run before really setting up
the RDDs to provoke the failure.
The dump suggests it is in the constructor though.
```scala
val qDigest = new QDigest(256.0)
val s2 = "################ qDigest: " + qDigest.toString()
println(s2)
Exception in thread "main" java.lang.NoClassDefFoundError:
it/unimi/dsi/fastutil/longs/Long2LongOpenHashMap
at
com.clearspring.analytics.stream.quantile.QDigest.<init>(QDigest.java:79)
at com.*********
at com.*********
at com.*********
at com.*********
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException:
it.unimi.dsi.fastutil.longs.Long2LongOpenHashMap
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more
```
> Spark doesn't build with ```minimizeJar```. I was referring to the
parquet-column package. Spark should not, philosophically, be depended upon to
provide anything but Spark (well, and anything third party that's necessary to
invoke it). Indeed a lot of issues here descend from the fact that things
aren't shaded and conflict.
Ah yes, parquet-column was the one with minimizeJar, not Spark.
> classpath-first is supposed to be a mechanism to work around this no
matter if the conflict came from elsewhere. And if it isn't, that needs to be
fixed ideally, as a first priority.
Makes sense.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]