Hey Spark Users,

I'm writing a demo with Spark and HBase. What I've done is packaging a
**fat jar**: place dependencies in `build.sbt`, and use `sbt assembly` to
package **all dependencies** into one big jar. The rest work is copy the
fat jar to Spark master node and then launch by `spark-submit`.

The defect of the "fat jar" fashion is obvious: all dependencies is packed,
yielding a huge jar file. Even worse, in my case, a vast amount of the
conflicting package files  in `~/.ivy/cache`fails when merging, I had to
manually specify `MergingStrategy` as `rename` for all conflicting files to
bypass this issue.

Then I thought, there should exists an easier way to submit a "thin jar"
with build.sbt-like file specifying dependencies, and then dependencies are
automatically resolved across the cluster before the actual job is
launched. I googled, except nothing related was found. Is this plausible,
or is there other better ways to achieve the same goal?

BEST REGARDS,
Todd Leo

Reply via email to