Re: mvn or sbt for studying and developing Spark?
* I moved from sbt to maven in June specifically due to Andrew Or's describing mvn as the default build tool. Developers should keep in mind that jenkins uses mvn so we need to run mvn before submitting PR's - even if sbt were used for day to day dev work To be clear, I think that the PR builder actually uses sbt https://github.com/apache/spark/blob/master/dev/run-tests#L198 currently, but there are master builds that make sure maven doesn't break (amongst other things). * In addition, as Sean has alluded to, the Intellij seems to comprehend the maven builds a bit more readily than sbt Yeah, this is a very good point. I have used `sbt/sbt gen-idea` in the past, but I'm currently using the maven integration of inteliJ since it seems more stable. * But for command line and day to day dev purposes: sbt sounds great to use Those sound bites you provided about exposing built-in test databases for hive and for displaying available testcases are sweet. Any easy/convenient way to see more of those kinds of facilities available through sbt ? The Spark SQL developer readme https://github.com/apache/spark/tree/master/sql has a little bit of this, but we really should have some documentation on using SBT as well. Integrating with those systems is generally easier if you are also working with Spark in Maven. (And I wouldn't classify all of those Maven-built systems as legacy, Michael :) Also a good point, though I've seen some pretty clever uses of sbt's external project references to link spark into other projects. I'll certainly admit I have a bias towards new shiny things in general though, so my definition of legacy is probably skewed :)
Re: mvn or sbt for studying and developing Spark?
The docs on using sbt are here: https://github.com/apache/spark/blob/master/docs/building-spark.md#building-with-sbt They'll be published with 1.2.0 presumably. On 2014년 11월 17일 (월) at 오후 2:49 Michael Armbrust mich...@databricks.com wrote: * I moved from sbt to maven in June specifically due to Andrew Or's describing mvn as the default build tool. Developers should keep in mind that jenkins uses mvn so we need to run mvn before submitting PR's - even if sbt were used for day to day dev work To be clear, I think that the PR builder actually uses sbt https://github.com/apache/spark/blob/master/dev/run-tests#L198 currently, but there are master builds that make sure maven doesn't break (amongst other things). * In addition, as Sean has alluded to, the Intellij seems to comprehend the maven builds a bit more readily than sbt Yeah, this is a very good point. I have used `sbt/sbt gen-idea` in the past, but I'm currently using the maven integration of inteliJ since it seems more stable. * But for command line and day to day dev purposes: sbt sounds great to use Those sound bites you provided about exposing built-in test databases for hive and for displaying available testcases are sweet. Any easy/convenient way to see more of those kinds of facilities available through sbt ? The Spark SQL developer readme https://github.com/apache/spark/tree/master/sql has a little bit of this, but we really should have some documentation on using SBT as well. Integrating with those systems is generally easier if you are also working with Spark in Maven. (And I wouldn't classify all of those Maven-built systems as legacy, Michael :) Also a good point, though I've seen some pretty clever uses of sbt's external project references to link spark into other projects. I'll certainly admit I have a bias towards new shiny things in general though, so my definition of legacy is probably skewed :)
Re: mvn or sbt for studying and developing Spark?
Hi Stephen and Sean, Thanks for correction. On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote: No, the Maven build is the main one. I would use it unless you have a need to use the SBT build in particular. On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, *Dinesh J. Weerakkody* -- Thanks Best Regards, *Dinesh J. Weerakkody*
Re: mvn or sbt for studying and developing Spark?
I'm going to have to disagree here. If you are building a release distribution or integrating with legacy systems then maven is probably the correct choice. However most of the core developers that I know use sbt, and I think its a better choice for exploration and development overall. That said, this probably falls into the category of a religious argument so you might want to look at both options and decide for yourself. In my experience the SBT build is significantly faster with less effort (and I think sbt is still faster even if you go through the extra effort of installing zinc) and easier to read. The console mode of sbt (just run sbt/sbt and then a long running console session is started that will accept further commands) is great for building individual subprojects or running single test suites. In addition to being faster since its a long running JVM, its got a lot of nice features like tab-completion for test case names. For example, if I wanted to see what test cases are available in the SQL subproject you can do the following: [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt [info] Loading project definition from /Users/marmbrus/workspace/spark/project/project [info] Loading project definition from /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project [info] Set current project to spark-parent (in build file:/Users/marmbrus/workspace/spark/) sql/test-only *tab* -- org.apache.spark.sql.CachedTableSuite org.apache.spark.sql.DataTypeSuite org.apache.spark.sql.DslQuerySuite org.apache.spark.sql.InsertIntoSuite ... Another very useful feature is the development console, which starts an interactive REPL including the most recent version of the code and a lot of useful imports for some subprojects. For example in the hive subproject it automatically sets up a temporary database with a bunch of test data pre-loaded: $ sbt/sbt hive/console hive/console ... import org.apache.spark.sql.hive._ import org.apache.spark.sql.hive.test.TestHive._ import org.apache.spark.sql.parquet.ParquetTestData Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45). Type in expressions to have them evaluated. Type :help for more information. scala sql(SELECT * FROM src).take(2) res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86]) Michael On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Stephen and Sean, Thanks for correction. On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote: No, the Maven build is the main one. I would use it unless you have a need to use the SBT build in particular. On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, *Dinesh J. Weerakkody* -- Thanks Best Regards, *Dinesh J. Weerakkody*
Re: mvn or sbt for studying and developing Spark?
Yeah, my comment was mostly reflecting the fact that mvn is what creates the releases and is the 'build of reference', from which the SBT build is generated. The docs were recently changed to suggest that Maven is the default build and SBT is for advanced users. I find Maven plays nicer with IDEs, or at least, IntelliJ. SBT is faster for incremental compilation and better for anyone who knows and can leverage SBT's model. If someone's new to it all, I dunno, they're likelier to have fewer problems using Maven to start? YMMV. On Sun, Nov 16, 2014 at 9:23 PM, Michael Armbrust mich...@databricks.com wrote: I'm going to have to disagree here. If you are building a release distribution or integrating with legacy systems then maven is probably the correct choice. However most of the core developers that I know use sbt, and I think its a better choice for exploration and development overall. That said, this probably falls into the category of a religious argument so you might want to look at both options and decide for yourself. In my experience the SBT build is significantly faster with less effort (and I think sbt is still faster even if you go through the extra effort of installing zinc) and easier to read. The console mode of sbt (just run sbt/sbt and then a long running console session is started that will accept further commands) is great for building individual subprojects or running single test suites. In addition to being faster since its a long running JVM, its got a lot of nice features like tab-completion for test case names. For example, if I wanted to see what test cases are available in the SQL subproject you can do the following: [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt [info] Loading project definition from /Users/marmbrus/workspace/spark/project/project [info] Loading project definition from /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project [info] Set current project to spark-parent (in build file:/Users/marmbrus/workspace/spark/) sql/test-only tab -- org.apache.spark.sql.CachedTableSuite org.apache.spark.sql.DataTypeSuite org.apache.spark.sql.DslQuerySuite org.apache.spark.sql.InsertIntoSuite ... Another very useful feature is the development console, which starts an interactive REPL including the most recent version of the code and a lot of useful imports for some subprojects. For example in the hive subproject it automatically sets up a temporary database with a bunch of test data pre-loaded: $ sbt/sbt hive/console hive/console ... import org.apache.spark.sql.hive._ import org.apache.spark.sql.hive.test.TestHive._ import org.apache.spark.sql.parquet.ParquetTestData Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45). Type in expressions to have them evaluated. Type :help for more information. scala sql(SELECT * FROM src).take(2) res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86]) Michael On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Stephen and Sean, Thanks for correction. On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote: No, the Maven build is the main one. I would use it unless you have a need to use the SBT build in particular. On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, *Dinesh J. Weerakkody* -- Thanks Best Regards, *Dinesh J. Weerakkody* - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: mvn or sbt for studying and developing Spark?
HI Michael, That insight is useful. Some thoughts: * I moved from sbt to maven in June specifically due to Andrew Or's describing mvn as the default build tool. Developers should keep in mind that jenkins uses mvn so we need to run mvn before submitting PR's - even if sbt were used for day to day dev work * In addition, as Sean has alluded to, the Intellij seems to comprehend the maven builds a bit more readily than sbt * But for command line and day to day dev purposes: sbt sounds great to use Those sound bites you provided about exposing built-in test databases for hive and for displaying available testcases are sweet. Any easy/convenient way to see more of those kinds of facilities available through sbt ? 2014-11-16 13:23 GMT-08:00 Michael Armbrust mich...@databricks.com: I'm going to have to disagree here. If you are building a release distribution or integrating with legacy systems then maven is probably the correct choice. However most of the core developers that I know use sbt, and I think its a better choice for exploration and development overall. That said, this probably falls into the category of a religious argument so you might want to look at both options and decide for yourself. In my experience the SBT build is significantly faster with less effort (and I think sbt is still faster even if you go through the extra effort of installing zinc) and easier to read. The console mode of sbt (just run sbt/sbt and then a long running console session is started that will accept further commands) is great for building individual subprojects or running single test suites. In addition to being faster since its a long running JVM, its got a lot of nice features like tab-completion for test case names. For example, if I wanted to see what test cases are available in the SQL subproject you can do the following: [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt [info] Loading project definition from /Users/marmbrus/workspace/spark/project/project [info] Loading project definition from /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project [info] Set current project to spark-parent (in build file:/Users/marmbrus/workspace/spark/) sql/test-only *tab* -- org.apache.spark.sql.CachedTableSuite org.apache.spark.sql.DataTypeSuite org.apache.spark.sql.DslQuerySuite org.apache.spark.sql.InsertIntoSuite ... Another very useful feature is the development console, which starts an interactive REPL including the most recent version of the code and a lot of useful imports for some subprojects. For example in the hive subproject it automatically sets up a temporary database with a bunch of test data pre-loaded: $ sbt/sbt hive/console hive/console ... import org.apache.spark.sql.hive._ import org.apache.spark.sql.hive.test.TestHive._ import org.apache.spark.sql.parquet.ParquetTestData Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45). Type in expressions to have them evaluated. Type :help for more information. scala sql(SELECT * FROM src).take(2) res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86]) Michael On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Stephen and Sean, Thanks for correction. On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote: No, the Maven build is the main one. I would use it unless you have a need to use the SBT build in particular. On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, *Dinesh J. Weerakkody* -- Thanks Best Regards, *Dinesh J. Weerakkody*
Re: mvn or sbt for studying and developing Spark?
The console mode of sbt (just run sbt/sbt and then a long running console session is started that will accept further commands) is great for building individual subprojects or running single test suites. In addition to being faster since its a long running JVM, its got a lot of nice features like tab-completion for test case names. We include the scala-maven-plugin in spark/pom.xml, so equivalent functionality is available using Maven. You can start a console session with `mvn scala:console`. On Sun, Nov 16, 2014 at 1:23 PM, Michael Armbrust mich...@databricks.com wrote: I'm going to have to disagree here. If you are building a release distribution or integrating with legacy systems then maven is probably the correct choice. However most of the core developers that I know use sbt, and I think its a better choice for exploration and development overall. That said, this probably falls into the category of a religious argument so you might want to look at both options and decide for yourself. In my experience the SBT build is significantly faster with less effort (and I think sbt is still faster even if you go through the extra effort of installing zinc) and easier to read. The console mode of sbt (just run sbt/sbt and then a long running console session is started that will accept further commands) is great for building individual subprojects or running single test suites. In addition to being faster since its a long running JVM, its got a lot of nice features like tab-completion for test case names. For example, if I wanted to see what test cases are available in the SQL subproject you can do the following: [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt [info] Loading project definition from /Users/marmbrus/workspace/spark/project/project [info] Loading project definition from /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project [info] Set current project to spark-parent (in build file:/Users/marmbrus/workspace/spark/) sql/test-only *tab* -- org.apache.spark.sql.CachedTableSuite org.apache.spark.sql.DataTypeSuite org.apache.spark.sql.DslQuerySuite org.apache.spark.sql.InsertIntoSuite ... Another very useful feature is the development console, which starts an interactive REPL including the most recent version of the code and a lot of useful imports for some subprojects. For example in the hive subproject it automatically sets up a temporary database with a bunch of test data pre-loaded: $ sbt/sbt hive/console hive/console ... import org.apache.spark.sql.hive._ import org.apache.spark.sql.hive.test.TestHive._ import org.apache.spark.sql.parquet.ParquetTestData Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45). Type in expressions to have them evaluated. Type :help for more information. scala sql(SELECT * FROM src).take(2) res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86]) Michael On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Stephen and Sean, Thanks for correction. On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote: No, the Maven build is the main one. I would use it unless you have a need to use the SBT build in particular. On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, *Dinesh J. Weerakkody* -- Thanks Best Regards, *Dinesh J. Weerakkody*
Re: mvn or sbt for studying and developing Spark?
Neither is strictly optimal which is why we ended up supporting both. Our reference build for packaging is Maven so you are less likely to run into unexpected dependency issues, etc. Many developers use sbt as well. It's somewhat religion and the best thing might be to try both and see which you prefer. - Patrick On Sun, Nov 16, 2014 at 1:47 PM, Mark Hamstra m...@clearstorydata.com wrote: The console mode of sbt (just run sbt/sbt and then a long running console session is started that will accept further commands) is great for building individual subprojects or running single test suites. In addition to being faster since its a long running JVM, its got a lot of nice features like tab-completion for test case names. We include the scala-maven-plugin in spark/pom.xml, so equivalent functionality is available using Maven. You can start a console session with `mvn scala:console`. On Sun, Nov 16, 2014 at 1:23 PM, Michael Armbrust mich...@databricks.com wrote: I'm going to have to disagree here. If you are building a release distribution or integrating with legacy systems then maven is probably the correct choice. However most of the core developers that I know use sbt, and I think its a better choice for exploration and development overall. That said, this probably falls into the category of a religious argument so you might want to look at both options and decide for yourself. In my experience the SBT build is significantly faster with less effort (and I think sbt is still faster even if you go through the extra effort of installing zinc) and easier to read. The console mode of sbt (just run sbt/sbt and then a long running console session is started that will accept further commands) is great for building individual subprojects or running single test suites. In addition to being faster since its a long running JVM, its got a lot of nice features like tab-completion for test case names. For example, if I wanted to see what test cases are available in the SQL subproject you can do the following: [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt [info] Loading project definition from /Users/marmbrus/workspace/spark/project/project [info] Loading project definition from /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project [info] Set current project to spark-parent (in build file:/Users/marmbrus/workspace/spark/) sql/test-only *tab* -- org.apache.spark.sql.CachedTableSuite org.apache.spark.sql.DataTypeSuite org.apache.spark.sql.DslQuerySuite org.apache.spark.sql.InsertIntoSuite ... Another very useful feature is the development console, which starts an interactive REPL including the most recent version of the code and a lot of useful imports for some subprojects. For example in the hive subproject it automatically sets up a temporary database with a bunch of test data pre-loaded: $ sbt/sbt hive/console hive/console ... import org.apache.spark.sql.hive._ import org.apache.spark.sql.hive.test.TestHive._ import org.apache.spark.sql.parquet.ParquetTestData Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45). Type in expressions to have them evaluated. Type :help for more information. scala sql(SELECT * FROM src).take(2) res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86]) Michael On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Stephen and Sean, Thanks for correction. On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote: No, the Maven build is the main one. I would use it unless you have a need to use the SBT build in particular. On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, *Dinesh J.
Re: mvn or sbt for studying and developing Spark?
Ok, strictly speaking, that's equivalent to your second class of examples, development console, not the first sbt console On Sun, Nov 16, 2014 at 1:47 PM, Mark Hamstra m...@clearstorydata.com wrote: The console mode of sbt (just run sbt/sbt and then a long running console session is started that will accept further commands) is great for building individual subprojects or running single test suites. In addition to being faster since its a long running JVM, its got a lot of nice features like tab-completion for test case names. We include the scala-maven-plugin in spark/pom.xml, so equivalent functionality is available using Maven. You can start a console session with `mvn scala:console`. On Sun, Nov 16, 2014 at 1:23 PM, Michael Armbrust mich...@databricks.com wrote: I'm going to have to disagree here. If you are building a release distribution or integrating with legacy systems then maven is probably the correct choice. However most of the core developers that I know use sbt, and I think its a better choice for exploration and development overall. That said, this probably falls into the category of a religious argument so you might want to look at both options and decide for yourself. In my experience the SBT build is significantly faster with less effort (and I think sbt is still faster even if you go through the extra effort of installing zinc) and easier to read. The console mode of sbt (just run sbt/sbt and then a long running console session is started that will accept further commands) is great for building individual subprojects or running single test suites. In addition to being faster since its a long running JVM, its got a lot of nice features like tab-completion for test case names. For example, if I wanted to see what test cases are available in the SQL subproject you can do the following: [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt [info] Loading project definition from /Users/marmbrus/workspace/spark/project/project [info] Loading project definition from /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project [info] Set current project to spark-parent (in build file:/Users/marmbrus/workspace/spark/) sql/test-only *tab* -- org.apache.spark.sql.CachedTableSuite org.apache.spark.sql.DataTypeSuite org.apache.spark.sql.DslQuerySuite org.apache.spark.sql.InsertIntoSuite ... Another very useful feature is the development console, which starts an interactive REPL including the most recent version of the code and a lot of useful imports for some subprojects. For example in the hive subproject it automatically sets up a temporary database with a bunch of test data pre-loaded: $ sbt/sbt hive/console hive/console ... import org.apache.spark.sql.hive._ import org.apache.spark.sql.hive.test.TestHive._ import org.apache.spark.sql.parquet.ParquetTestData Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45). Type in expressions to have them evaluated. Type :help for more information. scala sql(SELECT * FROM src).take(2) res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86]) Michael On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Stephen and Sean, Thanks for correction. On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote: No, the Maven build is the main one. I would use it unless you have a need to use the SBT build in particular. On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, *Dinesh J. Weerakkody* -- Thanks Best Regards, *Dinesh J. Weerakkody*
re: mvn or sbt for studying and developing Spark?
Hi Dinesh, Sean, Michael, Stephen, Mark, and Patrick Thank you for your reply and discussions. So the conclusion is that mvn is preferred when packaging and distribution, while sbt is better for development. This also explains why the compilation tool of make-distribution.sh changed from sbt (in spark-0.9) to mvn(in spark-1.0). Cheers, Yiming 发件人: Dinesh J. Weerakkody [mailto:dineshjweerakk...@gmail.com] 发送时间: 2014年11月16日 10:58 收件人: sdi...@gmail.com 抄送: dev@spark.apache.org 主题: Re: mvn or sbt for studying and developing Spark? Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com mailto:sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, Dinesh J. Weerakkody
Re: mvn or sbt for studying and developing Spark?
More or less correct, but I'd add that there are an awful lot of software systems out there that use Maven. Integrating with those systems is generally easier if you are also working with Spark in Maven. (And I wouldn't classify all of those Maven-built systems as legacy, Michael :) What that ends up meaning is that if you are working *on* Spark, then SBT can be more convenient and productive; but if you are working *with* Spark along with other significant pieces of software, then using Maven can be the better approach. On Sun, Nov 16, 2014 at 6:11 PM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi Dinesh, Sean, Michael, Stephen, Mark, and Patrick Thank you for your reply and discussions. So the conclusion is that mvn is preferred when packaging and distribution, while sbt is better for development. This also explains why the compilation tool of make-distribution.sh changed from sbt (in spark-0.9) to mvn(in spark-1.0). Cheers, Yiming 发件人: Dinesh J. Weerakkody [mailto:dineshjweerakk...@gmail.com] 发送时间: 2014年11月16日 10:58 收件人: sdi...@gmail.com 抄送: dev@spark.apache.org 主题: Re: mvn or sbt for studying and developing Spark? Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com mailto:sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, Dinesh J. Weerakkody
mvn or sbt for studying and developing Spark?
Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming
Re: mvn or sbt for studying and developing Spark?
Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, *Dinesh J. Weerakkody*
Re: mvn or sbt for studying and developing Spark?
No, the Maven build is the main one. I would use it unless you have a need to use the SBT build in particular. On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com wrote: Hi Yiming, I believe that both SBT and MVN is supported in SPARK, but SBT is preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some build failures. After that used SBT and works fine. You can go through these discussions regarding SBT vs MVN and learn pros and cons of both [1] [2]. [1] http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html [2] https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ Thanks, On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I am new in developing Spark and my current focus is about co-scheduling of spark tasks. However, I am confused with the building tools: sometimes the documentation uses mvn but sometimes uses sbt. So, my question is that which one is the preferred tool of Spark community? And what's the technical difference between them? Thank you! Cheers, Yiming -- Thanks Best Regards, *Dinesh J. Weerakkody*