Our build is complex; it uses a large number of third party jars and
generates an uber jar that is shaded before we pass it to spark submit. We
shade to avoid ClassLoader collisions with Spark platform dependencies
(e.g. protobuf 3).

Managing the dependencies/shade is cumbersome and error prone. The shade
plugin itself takes a long time to run. Some jars (e.g. apache commons
products) use classpath reflection - therefore the build does not fail
until runtime.

We have attempted the spark.{user,executor}.userClasspathFirst settings,
but they're marked as experimental and fail sometimes.

We are considering implementing our own ClassLoaders and/or rebuilding and
shading the spark distribution.

Are there better alternatives?

-- 
Thanks,
Jason

Reply via email to