Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/4780#issuecomment-76778163
Darn, I had hoped that was the answer, since it really seems like
prioritizing your classes should mean your copy of CA, which definitely sits
next to all of fastutil, is used. Yes, that's what is supposed to happen.
I wonder if we're hitting some bugs with the _serializer's_ classloader not
seeing the right classloaders. Some issues like that are fixed in 1.3.0. It may
still be that this is OK in 1.3, but that's just a guess right now.
Remind me where the error occurs? is it within a stack trace that includes
the serializer?
Spark doesn't build with `minimizeJar`. I was referring to the
`parquet-column` package. Spark should not, philosophically, be depended upon
to provide anything but Spark (well, and anything third party that's necessary
to invoke it). Indeed a lot of issues here descend from the fact that things
aren't shaded and conflict.
There's a good argument that just about everything Spark (and Hadoop) uses
should be shaded. It wasn't and at this point might be disruptive to change it,
but in theory you can use the classpath-first properties to overrule whatever
is in Spark anyway.
I don't think you need `minimizeJar` to make things work, but it would help
to make your app jar much smaller.
I don't think the HyperLogLog usage is relevant per se. If you mean, should
that just be shaded in Spark? It sure should resolve the conflict that appears
to be at the root of this, but I guess we're assuming that Spark is the only
thing in the whole runtime classpath that includes CA. I don't even know that.
classpath-first is supposed to be a mechanism to work around this no matter
if the conflict came from elsewhere. And if it isn't, that needs to be fixed
ideally, as a first priority.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]