Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-06-03 Thread Pierre Borckmans
You might want to look at another great plugin : “sbt-pack” 
https://github.com/xerial/sbt-pack.

It collects all the dependencies JARs and creates launch scripts for *nix 
(including Mac OS) and windows.

HTH

Pierre


On 02 Jun 2014, at 17:29, Andrei faithlessfri...@gmail.com wrote:

 Thanks! This is even closer to what I am looking for. I'm in a trip now, so 
 I'm going to give it a try when I come back. 
 
 
 On Mon, Jun 2, 2014 at 5:12 AM, Ngoc Dao ngocdaoth...@gmail.com wrote:
 Alternative solution:
 https://github.com/xitrum-framework/xitrum-package
 
 It collects all dependency .jar files in your Scala program into a
 directory. It doesn't merge the .jar files together, the .jar files
 are left as is.
 
 
 On Sat, May 31, 2014 at 3:42 AM, Andrei faithlessfri...@gmail.com wrote:
  Thanks, Stephen. I have eventually decided to go with assembly, but put away
  Spark and Hadoop jars, and instead use `spark-submit` to automatically
  provide these dependencies. This way no resource conflicts arise and
  mergeStrategy needs no modification. To memorize this stable setup and also
  share it with the community I've crafted a project [1] with minimal working
  config. It is SBT project with assembly plugin, Spark 1.0 and Cloudera's
  Hadoop client. Hope, it will help somebody to take Spark setup quicker.
 
  Though I'm fine with this setup for final builds, I'm still looking for a
  more interactive dev setup - something that doesn't require full rebuild.
 
  [1]: https://github.com/faithlessfriend/sample-spark-project
 
  Thanks and have a good weekend,
  Andrei
 
  On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch java...@gmail.com wrote:
 
 
  The MergeStrategy combined with sbt assembly did work for me.  This is not
  painless: some trial and error and the assembly may take multiple minutes.
 
  You will likely want to filter out some additional classes from the
  generated jar file.  Here is an SOF answer to explain that and with IMHO 
  the
  best answer snippet included here (in this case the OP understandably did
  not want to not include javax.servlet.Servlet)
 
  http://stackoverflow.com/questions/7819066/sbt-exclude-class-from-jar
 
 
  mappings in (Compile,packageBin) ~= { (ms: Seq[(File, String)]) = ms
  filter { case (file, toPath) = toPath != javax/servlet/Servlet.class } }
 
  There is a setting to not include the project files in the assembly but I
  do not recall it at this moment.
 
 
 
  2014-05-29 10:13 GMT-07:00 Andrei faithlessfri...@gmail.com:
 
  Thanks, Jordi, your gist looks pretty much like what I have in my project
  currently (with few exceptions that I'm going to borrow).
 
  I like the idea of using sbt package, since it doesn't require third
  party plugins and, most important, doesn't create a mess of classes and
  resources. But in this case I'll have to handle jar list manually via 
  Spark
  context. Is there a way to automate this process? E.g. when I was a 
  Clojure
  guy, I could run lein deps (lein is a build tool similar to sbt) to
  download all dependencies and then just enumerate them from my app. Maybe
  you have heard of something like that for Spark/SBT?
 
  Thanks,
  Andrei
 
 
  On Thu, May 29, 2014 at 3:48 PM, jaranda jordi.ara...@bsc.es wrote:
 
  Hi Andrei,
 
  I think the preferred way to deploy Spark jobs is by using the sbt
  package
  task instead of using the sbt assembly plugin. In any case, as you
  comment,
  the mergeStrategy in combination with some dependency exlusions should
  fix
  your problems. Have a look at  this gist
  https://gist.github.com/JordiAranda/bdbad58d128c14277a05   for further
  details (I just followed some recommendations commented in the sbt
  assembly
  plugin documentation).
 
  Up to now I haven't found a proper way to combine my
  development/deployment
  phases, although I must say my experience in Spark is pretty poor (it
  really
  depends in your deployment requirements as well). In this case, I think
  someone else could give you some further insights.
 
  Best,
 
 
 
  --
  View this message in context:
  http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 
 
 
 



Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-06-02 Thread Andrei
Thanks! This is even closer to what I am looking for. I'm in a trip now, so
I'm going to give it a try when I come back.


On Mon, Jun 2, 2014 at 5:12 AM, Ngoc Dao ngocdaoth...@gmail.com wrote:

 Alternative solution:
 https://github.com/xitrum-framework/xitrum-package

 It collects all dependency .jar files in your Scala program into a
 directory. It doesn't merge the .jar files together, the .jar files
 are left as is.


 On Sat, May 31, 2014 at 3:42 AM, Andrei faithlessfri...@gmail.com wrote:
  Thanks, Stephen. I have eventually decided to go with assembly, but put
 away
  Spark and Hadoop jars, and instead use `spark-submit` to automatically
  provide these dependencies. This way no resource conflicts arise and
  mergeStrategy needs no modification. To memorize this stable setup and
 also
  share it with the community I've crafted a project [1] with minimal
 working
  config. It is SBT project with assembly plugin, Spark 1.0 and Cloudera's
  Hadoop client. Hope, it will help somebody to take Spark setup quicker.
 
  Though I'm fine with this setup for final builds, I'm still looking for a
  more interactive dev setup - something that doesn't require full rebuild.
 
  [1]: https://github.com/faithlessfriend/sample-spark-project
 
  Thanks and have a good weekend,
  Andrei
 
  On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch java...@gmail.com
 wrote:
 
 
  The MergeStrategy combined with sbt assembly did work for me.  This is
 not
  painless: some trial and error and the assembly may take multiple
 minutes.
 
  You will likely want to filter out some additional classes from the
  generated jar file.  Here is an SOF answer to explain that and with
 IMHO the
  best answer snippet included here (in this case the OP understandably
 did
  not want to not include javax.servlet.Servlet)
 
  http://stackoverflow.com/questions/7819066/sbt-exclude-class-from-jar
 
 
  mappings in (Compile,packageBin) ~= { (ms: Seq[(File, String)]) = ms
  filter { case (file, toPath) = toPath != javax/servlet/Servlet.class
 } }
 
  There is a setting to not include the project files in the assembly but
 I
  do not recall it at this moment.
 
 
 
  2014-05-29 10:13 GMT-07:00 Andrei faithlessfri...@gmail.com:
 
  Thanks, Jordi, your gist looks pretty much like what I have in my
 project
  currently (with few exceptions that I'm going to borrow).
 
  I like the idea of using sbt package, since it doesn't require third
  party plugins and, most important, doesn't create a mess of classes and
  resources. But in this case I'll have to handle jar list manually via
 Spark
  context. Is there a way to automate this process? E.g. when I was a
 Clojure
  guy, I could run lein deps (lein is a build tool similar to sbt) to
  download all dependencies and then just enumerate them from my app.
 Maybe
  you have heard of something like that for Spark/SBT?
 
  Thanks,
  Andrei
 
 
  On Thu, May 29, 2014 at 3:48 PM, jaranda jordi.ara...@bsc.es wrote:
 
  Hi Andrei,
 
  I think the preferred way to deploy Spark jobs is by using the sbt
  package
  task instead of using the sbt assembly plugin. In any case, as you
  comment,
  the mergeStrategy in combination with some dependency exlusions should
  fix
  your problems. Have a look at  this gist
  https://gist.github.com/JordiAranda/bdbad58d128c14277a05   for
 further
  details (I just followed some recommendations commented in the sbt
  assembly
  plugin documentation).
 
  Up to now I haven't found a proper way to combine my
  development/deployment
  phases, although I must say my experience in Spark is pretty poor (it
  really
  depends in your deployment requirements as well). In this case, I
 think
  someone else could give you some further insights.
 
  Best,
 
 
 
  --
  View this message in context:
 
 http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
  Sent from the Apache Spark User List mailing list archive at
 Nabble.com.
 
 
 
 



Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-05-30 Thread Andrei
Thanks, Stephen. I have eventually decided to go with assembly, but put
away Spark and Hadoop jars, and instead use `spark-submit` to automatically
provide these dependencies. This way no resource conflicts arise and
mergeStrategy needs no modification. To memorize this stable setup and also
share it with the community I've crafted a project [1] with minimal working
config. It is SBT project with assembly plugin, Spark 1.0 and Cloudera's
Hadoop client. Hope, it will help somebody to take Spark setup quicker.

Though I'm fine with this setup for final builds, I'm still looking for a
more interactive dev setup - something that doesn't require full rebuild.

[1]: https://github.com/faithlessfriend/sample-spark-project

Thanks and have a good weekend,
Andrei

On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch java...@gmail.com wrote:


 The MergeStrategy combined with sbt assembly did work for me.  This is not
 painless: some trial and error and the assembly may take multiple minutes.

 You will likely want to filter out some additional classes from the
 generated jar file.  Here is an SOF answer to explain that and with IMHO
 the best answer snippet included here (in this case the OP understandably
 did not want to not include javax.servlet.Servlet)

 http://stackoverflow.com/questions/7819066/sbt-exclude-class-from-jar


 mappings in (Compile,packageBin) ~= { (ms: Seq[(File, String)]) = ms
 filter { case (file, toPath) = toPath != javax/servlet/Servlet.class }
 }

 There is a setting to not include the project files in the assembly but I
 do not recall it at this moment.



 2014-05-29 10:13 GMT-07:00 Andrei faithlessfri...@gmail.com:

 Thanks, Jordi, your gist looks pretty much like what I have in my project
 currently (with few exceptions that I'm going to borrow).

 I like the idea of using sbt package, since it doesn't require third
 party plugins and, most important, doesn't create a mess of classes and
 resources. But in this case I'll have to handle jar list manually via Spark
 context. Is there a way to automate this process? E.g. when I was a Clojure
 guy, I could run lein deps (lein is a build tool similar to sbt) to
 download all dependencies and then just enumerate them from my app. Maybe
 you have heard of something like that for Spark/SBT?

 Thanks,
 Andrei


 On Thu, May 29, 2014 at 3:48 PM, jaranda jordi.ara...@bsc.es wrote:

 Hi Andrei,

 I think the preferred way to deploy Spark jobs is by using the sbt
 package
 task instead of using the sbt assembly plugin. In any case, as you
 comment,
 the mergeStrategy in combination with some dependency exlusions should
 fix
 your problems. Have a look at  this gist
 https://gist.github.com/JordiAranda/bdbad58d128c14277a05   for further
 details (I just followed some recommendations commented in the sbt
 assembly
 plugin documentation).

 Up to now I haven't found a proper way to combine my
 development/deployment
 phases, although I must say my experience in Spark is pretty poor (it
 really
 depends in your deployment requirements as well). In this case, I think
 someone else could give you some further insights.

 Best,



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.






Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-05-29 Thread jaranda
Hi Andrei,

I think the preferred way to deploy Spark jobs is by using the sbt package
task instead of using the sbt assembly plugin. In any case, as you comment,
the mergeStrategy in combination with some dependency exlusions should fix
your problems. Have a look at  this gist
https://gist.github.com/JordiAranda/bdbad58d128c14277a05   for further
details (I just followed some recommendations commented in the sbt assembly
plugin documentation).

Up to now I haven't found a proper way to combine my development/deployment
phases, although I must say my experience in Spark is pretty poor (it really
depends in your deployment requirements as well). In this case, I think
someone else could give you some further insights.

Best,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-05-29 Thread Andrei
Thanks, Jordi, your gist looks pretty much like what I have in my project
currently (with few exceptions that I'm going to borrow).

I like the idea of using sbt package, since it doesn't require third
party plugins and, most important, doesn't create a mess of classes and
resources. But in this case I'll have to handle jar list manually via Spark
context. Is there a way to automate this process? E.g. when I was a Clojure
guy, I could run lein deps (lein is a build tool similar to sbt) to
download all dependencies and then just enumerate them from my app. Maybe
you have heard of something like that for Spark/SBT?

Thanks,
Andrei


On Thu, May 29, 2014 at 3:48 PM, jaranda jordi.ara...@bsc.es wrote:

 Hi Andrei,

 I think the preferred way to deploy Spark jobs is by using the sbt package
 task instead of using the sbt assembly plugin. In any case, as you comment,
 the mergeStrategy in combination with some dependency exlusions should fix
 your problems. Have a look at  this gist
 https://gist.github.com/JordiAranda/bdbad58d128c14277a05   for further
 details (I just followed some recommendations commented in the sbt assembly
 plugin documentation).

 Up to now I haven't found a proper way to combine my development/deployment
 phases, although I must say my experience in Spark is pretty poor (it
 really
 depends in your deployment requirements as well). In this case, I think
 someone else could give you some further insights.

 Best,



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-05-29 Thread Stephen Boesch
The MergeStrategy combined with sbt assembly did work for me.  This is not
painless: some trial and error and the assembly may take multiple minutes.

You will likely want to filter out some additional classes from the
generated jar file.  Here is an SOF answer to explain that and with IMHO
the best answer snippet included here (in this case the OP understandably
did not want to not include javax.servlet.Servlet)

http://stackoverflow.com/questions/7819066/sbt-exclude-class-from-jar


mappings in (Compile,packageBin) ~= { (ms: Seq[(File, String)]) = ms
filter { case (file, toPath) = toPath != javax/servlet/Servlet.class } }

There is a setting to not include the project files in the assembly but I
do not recall it at this moment.



2014-05-29 10:13 GMT-07:00 Andrei faithlessfri...@gmail.com:

 Thanks, Jordi, your gist looks pretty much like what I have in my project
 currently (with few exceptions that I'm going to borrow).

 I like the idea of using sbt package, since it doesn't require third
 party plugins and, most important, doesn't create a mess of classes and
 resources. But in this case I'll have to handle jar list manually via Spark
 context. Is there a way to automate this process? E.g. when I was a Clojure
 guy, I could run lein deps (lein is a build tool similar to sbt) to
 download all dependencies and then just enumerate them from my app. Maybe
 you have heard of something like that for Spark/SBT?

 Thanks,
 Andrei


 On Thu, May 29, 2014 at 3:48 PM, jaranda jordi.ara...@bsc.es wrote:

 Hi Andrei,

 I think the preferred way to deploy Spark jobs is by using the sbt package
 task instead of using the sbt assembly plugin. In any case, as you
 comment,
 the mergeStrategy in combination with some dependency exlusions should fix
 your problems. Have a look at  this gist
 https://gist.github.com/JordiAranda/bdbad58d128c14277a05   for further
 details (I just followed some recommendations commented in the sbt
 assembly
 plugin documentation).

 Up to now I haven't found a proper way to combine my
 development/deployment
 phases, although I must say my experience in Spark is pretty poor (it
 really
 depends in your deployment requirements as well). In this case, I think
 someone else could give you some further insights.

 Best,



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.