Can Dependencies Be Resolved on Spark Cluster?

SLiZn Liu Mon, 29 Jun 2015 22:48:06 -0700

Hey Spark Users,

I'm writing a demo with Spark and HBase. What I've done is packaging a
**fat jar**: place dependencies in `build.sbt`, and use `sbt assembly` to
package **all dependencies** into one big jar. The rest work is copy the
fat jar to Spark master node and then launch by `spark-submit`.


The defect of the "fat jar" fashion is obvious: all dependencies is packed,
yielding a huge jar file. Even worse, in my case, a vast amount of the
conflicting package files  in `~/.ivy/cache`fails when merging, I had to
manually specify `MergingStrategy` as `rename` for all conflicting files to
bypass this issue.

Then I thought, there should exists an easier way to submit a "thin jar"
with build.sbt-like file specifying dependencies, and then dependencies are
automatically resolved across the cluster before the actual job is
launched. I googled, except nothing related was found. Is this plausible,
or is there other better ways to achieve the same goal?

BEST REGARDS,
Todd Leo

Can Dependencies Be Resolved on Spark Cluster?

Reply via email to