Re: mvn or sbt for studying and developing Spark?

2014-11-17 Thread Michael Armbrust

 * I moved from sbt to maven in June specifically due to Andrew Or's
 describing mvn as the default build tool.  Developers should keep in mind
 that jenkins uses mvn so we need to run mvn before submitting PR's - even
 if sbt were used for day to day dev work


To be clear, I think that the PR builder actually uses sbt
https://github.com/apache/spark/blob/master/dev/run-tests#L198 currently,
but there are master builds that make sure maven doesn't break (amongst
other things).


 *  In addition, as Sean has alluded to, the Intellij seems to comprehend
 the maven builds a bit more readily than sbt


Yeah, this is a very good point.  I have used `sbt/sbt gen-idea` in the
past, but I'm currently using the maven integration of inteliJ since it
seems more stable.


 * But for command line and day to day dev purposes:  sbt sounds great to
 use  Those sound bites you provided about exposing built-in test databases
 for hive and for displaying available testcases are sweet.  Any
 easy/convenient way to see more of  those kinds of facilities available
 through sbt ?


The Spark SQL developer readme
https://github.com/apache/spark/tree/master/sql has a little bit of this,
but we really should have some documentation on using SBT as well.

 Integrating with those systems is generally easier if you are also working
 with Spark in Maven.  (And I wouldn't classify all of those Maven-built
 systems as legacy, Michael :)


Also a good point, though I've seen some pretty clever uses of sbt's
external project references to link spark into other projects.  I'll
certainly admit I have a bias towards new shiny things in general though,
so my definition of legacy is probably skewed :)


Re: mvn or sbt for studying and developing Spark?

2014-11-17 Thread Nicholas Chammas
The docs on using sbt are here:
https://github.com/apache/spark/blob/master/docs/building-spark.md#building-with-sbt

They'll be published with 1.2.0 presumably.
On 2014년 11월 17일 (월) at 오후 2:49 Michael Armbrust mich...@databricks.com
wrote:

 
  * I moved from sbt to maven in June specifically due to Andrew Or's
  describing mvn as the default build tool.  Developers should keep in mind
  that jenkins uses mvn so we need to run mvn before submitting PR's - even
  if sbt were used for day to day dev work
 

 To be clear, I think that the PR builder actually uses sbt
 https://github.com/apache/spark/blob/master/dev/run-tests#L198
 currently,
 but there are master builds that make sure maven doesn't break (amongst
 other things).


  *  In addition, as Sean has alluded to, the Intellij seems to comprehend
  the maven builds a bit more readily than sbt
 

 Yeah, this is a very good point.  I have used `sbt/sbt gen-idea` in the
 past, but I'm currently using the maven integration of inteliJ since it
 seems more stable.


  * But for command line and day to day dev purposes:  sbt sounds great to
  use  Those sound bites you provided about exposing built-in test
 databases
  for hive and for displaying available testcases are sweet.  Any
  easy/convenient way to see more of  those kinds of facilities available
  through sbt ?
 

 The Spark SQL developer readme
 https://github.com/apache/spark/tree/master/sql has a little bit of
 this,
 but we really should have some documentation on using SBT as well.

  Integrating with those systems is generally easier if you are also working
  with Spark in Maven.  (And I wouldn't classify all of those Maven-built
  systems as legacy, Michael :)


 Also a good point, though I've seen some pretty clever uses of sbt's
 external project references to link spark into other projects.  I'll
 certainly admit I have a bias towards new shiny things in general though,
 so my definition of legacy is probably skewed :)



Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Dinesh J. Weerakkody
Hi Stephen and Sean,

Thanks for correction.

On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote:

 No, the Maven build is the main one.  I would use it unless you have a
 need to use the SBT build in particular.
 On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody 
 dineshjweerakk...@gmail.com wrote:

 Hi Yiming,

 I believe that both SBT and MVN is supported in SPARK, but SBT is
 preferred
 (I'm not 100% sure about this :) ). When I'm using MVN I got some build
 failures. After that used SBT and works fine.

 You can go through these discussions regarding SBT vs MVN and learn pros
 and cons of both [1] [2].

 [1]

 http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html

 [2]

 https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ

 Thanks,

 On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com
 wrote:

  Hi,
 
 
 
  I am new in developing Spark and my current focus is about
 co-scheduling of
  spark tasks. However, I am confused with the building tools: sometimes
 the
  documentation uses mvn but sometimes uses sbt.
 
 
 
  So, my question is that which one is the preferred tool of Spark
 community?
  And what's the technical difference between them? Thank you!
 
 
 
  Cheers,
 
  Yiming
 
 


 --
 Thanks  Best Regards,

 *Dinesh J. Weerakkody*




-- 
Thanks  Best Regards,

*Dinesh J. Weerakkody*


Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Michael Armbrust
I'm going to have to disagree here.  If you are building a release
distribution or integrating with legacy systems then maven is probably the
correct choice.  However most of the core developers that I know use sbt,
and I think its a better choice for exploration and development overall.
That said, this probably falls into the category of a religious argument so
you might want to look at both options and decide for yourself.

In my experience the SBT build is significantly faster with less effort
(and I think sbt is still faster even if you go through the extra effort of
installing zinc) and easier to read.  The console mode of sbt (just run
sbt/sbt and then a long running console session is started that will accept
further commands) is great for building individual subprojects or running
single test suites.  In addition to being faster since its a long running
JVM, its got a lot of nice features like tab-completion for test case names.

For example, if I wanted to see what test cases are available in the SQL
subproject you can do the following:

[marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
[info] Loading project definition from
/Users/marmbrus/workspace/spark/project/project
[info] Loading project definition from
/Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
[info] Set current project to spark-parent (in build
file:/Users/marmbrus/workspace/spark/)
 sql/test-only *tab*
--
 org.apache.spark.sql.CachedTableSuite
org.apache.spark.sql.DataTypeSuite
 org.apache.spark.sql.DslQuerySuite
org.apache.spark.sql.InsertIntoSuite
...

Another very useful feature is the development console, which starts an
interactive REPL including the most recent version of the code and a lot of
useful imports for some subprojects.  For example in the hive subproject it
automatically sets up a temporary database with a bunch of test data
pre-loaded:

$ sbt/sbt hive/console
 hive/console
...
import org.apache.spark.sql.hive._
import org.apache.spark.sql.hive.test.TestHive._
import org.apache.spark.sql.parquet.ParquetTestData
Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.7.0_45).
Type in expressions to have them evaluated.
Type :help for more information.

scala sql(SELECT * FROM src).take(2)
res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])

Michael

On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody 
dineshjweerakk...@gmail.com wrote:

 Hi Stephen and Sean,

 Thanks for correction.

 On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote:

  No, the Maven build is the main one.  I would use it unless you have a
  need to use the SBT build in particular.
  On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody 
  dineshjweerakk...@gmail.com wrote:
 
  Hi Yiming,
 
  I believe that both SBT and MVN is supported in SPARK, but SBT is
  preferred
  (I'm not 100% sure about this :) ). When I'm using MVN I got some build
  failures. After that used SBT and works fine.
 
  You can go through these discussions regarding SBT vs MVN and learn pros
  and cons of both [1] [2].
 
  [1]
 
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
 
  [2]
 
 
 https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
 
  Thanks,
 
  On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com
  wrote:
 
   Hi,
  
  
  
   I am new in developing Spark and my current focus is about
  co-scheduling of
   spark tasks. However, I am confused with the building tools: sometimes
  the
   documentation uses mvn but sometimes uses sbt.
  
  
  
   So, my question is that which one is the preferred tool of Spark
  community?
   And what's the technical difference between them? Thank you!
  
  
  
   Cheers,
  
   Yiming
  
  
 
 
  --
  Thanks  Best Regards,
 
  *Dinesh J. Weerakkody*
 
 


 --
 Thanks  Best Regards,

 *Dinesh J. Weerakkody*



Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Sean Owen
Yeah, my comment was mostly reflecting the fact that mvn is what
creates the releases and is the 'build of reference', from which the
SBT build is generated. The docs were recently changed to suggest that
Maven is the default build and SBT is for advanced users. I find Maven
plays nicer with IDEs, or at least, IntelliJ.

SBT is faster for incremental compilation and better for anyone who
knows and can leverage SBT's model.

If someone's new to it all, I dunno, they're likelier to have fewer
problems using Maven to start? YMMV.

On Sun, Nov 16, 2014 at 9:23 PM, Michael Armbrust
mich...@databricks.com wrote:
 I'm going to have to disagree here.  If you are building a release
 distribution or integrating with legacy systems then maven is probably the
 correct choice.  However most of the core developers that I know use sbt,
 and I think its a better choice for exploration and development overall.
 That said, this probably falls into the category of a religious argument so
 you might want to look at both options and decide for yourself.

 In my experience the SBT build is significantly faster with less effort (and
 I think sbt is still faster even if you go through the extra effort of
 installing zinc) and easier to read.  The console mode of sbt (just run
 sbt/sbt and then a long running console session is started that will accept
 further commands) is great for building individual subprojects or running
 single test suites.  In addition to being faster since its a long running
 JVM, its got a lot of nice features like tab-completion for test case names.

 For example, if I wanted to see what test cases are available in the SQL
 subproject you can do the following:

 [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
 [info] Loading project definition from
 /Users/marmbrus/workspace/spark/project/project
 [info] Loading project definition from
 /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
 [info] Set current project to spark-parent (in build
 file:/Users/marmbrus/workspace/spark/)
 sql/test-only tab
 --
 org.apache.spark.sql.CachedTableSuite
 org.apache.spark.sql.DataTypeSuite
 org.apache.spark.sql.DslQuerySuite
 org.apache.spark.sql.InsertIntoSuite
 ...

 Another very useful feature is the development console, which starts an
 interactive REPL including the most recent version of the code and a lot of
 useful imports for some subprojects.  For example in the hive subproject it
 automatically sets up a temporary database with a bunch of test data
 pre-loaded:

 $ sbt/sbt hive/console
 hive/console
 ...
 import org.apache.spark.sql.hive._
 import org.apache.spark.sql.hive.test.TestHive._
 import org.apache.spark.sql.parquet.ParquetTestData
 Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.7.0_45).
 Type in expressions to have them evaluated.
 Type :help for more information.

 scala sql(SELECT * FROM src).take(2)
 res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])

 Michael

 On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody
 dineshjweerakk...@gmail.com wrote:

 Hi Stephen and Sean,

 Thanks for correction.

 On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote:

  No, the Maven build is the main one.  I would use it unless you have a
  need to use the SBT build in particular.
  On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody 
  dineshjweerakk...@gmail.com wrote:
 
  Hi Yiming,
 
  I believe that both SBT and MVN is supported in SPARK, but SBT is
  preferred
  (I'm not 100% sure about this :) ). When I'm using MVN I got some build
  failures. After that used SBT and works fine.
 
  You can go through these discussions regarding SBT vs MVN and learn
  pros
  and cons of both [1] [2].
 
  [1]
 
 
  http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
 
  [2]
 
 
  https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
 
  Thanks,
 
  On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com
  wrote:
 
   Hi,
  
  
  
   I am new in developing Spark and my current focus is about
  co-scheduling of
   spark tasks. However, I am confused with the building tools:
   sometimes
  the
   documentation uses mvn but sometimes uses sbt.
  
  
  
   So, my question is that which one is the preferred tool of Spark
  community?
   And what's the technical difference between them? Thank you!
  
  
  
   Cheers,
  
   Yiming
  
  
 
 
  --
  Thanks  Best Regards,
 
  *Dinesh J. Weerakkody*
 
 


 --
 Thanks  Best Regards,

 *Dinesh J. Weerakkody*



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Stephen Boesch
HI Michael,
 That insight is useful.   Some thoughts:

* I moved from sbt to maven in June specifically due to Andrew Or's
describing mvn as the default build tool.  Developers should keep in mind
that jenkins uses mvn so we need to run mvn before submitting PR's - even
if sbt were used for day to day dev work
*  In addition, as Sean has alluded to, the Intellij seems to comprehend
the maven builds a bit more readily than sbt
* But for command line and day to day dev purposes:  sbt sounds great to
use  Those sound bites you provided about exposing built-in test databases
for hive and for displaying available testcases are sweet.  Any
easy/convenient way to see more of  those kinds of facilities available
through sbt ?


2014-11-16 13:23 GMT-08:00 Michael Armbrust mich...@databricks.com:

 I'm going to have to disagree here.  If you are building a release
 distribution or integrating with legacy systems then maven is probably the
 correct choice.  However most of the core developers that I know use sbt,
 and I think its a better choice for exploration and development overall.
 That said, this probably falls into the category of a religious argument so
 you might want to look at both options and decide for yourself.

 In my experience the SBT build is significantly faster with less effort
 (and I think sbt is still faster even if you go through the extra effort of
 installing zinc) and easier to read.  The console mode of sbt (just run
 sbt/sbt and then a long running console session is started that will accept
 further commands) is great for building individual subprojects or running
 single test suites.  In addition to being faster since its a long running
 JVM, its got a lot of nice features like tab-completion for test case
 names.

 For example, if I wanted to see what test cases are available in the SQL
 subproject you can do the following:

 [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
 [info] Loading project definition from
 /Users/marmbrus/workspace/spark/project/project
 [info] Loading project definition from

 /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
 [info] Set current project to spark-parent (in build
 file:/Users/marmbrus/workspace/spark/)
  sql/test-only *tab*
 --
  org.apache.spark.sql.CachedTableSuite
 org.apache.spark.sql.DataTypeSuite
  org.apache.spark.sql.DslQuerySuite
 org.apache.spark.sql.InsertIntoSuite
 ...

 Another very useful feature is the development console, which starts an
 interactive REPL including the most recent version of the code and a lot of
 useful imports for some subprojects.  For example in the hive subproject it
 automatically sets up a temporary database with a bunch of test data
 pre-loaded:

 $ sbt/sbt hive/console
  hive/console
 ...
 import org.apache.spark.sql.hive._
 import org.apache.spark.sql.hive.test.TestHive._
 import org.apache.spark.sql.parquet.ParquetTestData
 Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.7.0_45).
 Type in expressions to have them evaluated.
 Type :help for more information.

 scala sql(SELECT * FROM src).take(2)
 res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])

 Michael

 On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody 
 dineshjweerakk...@gmail.com wrote:

  Hi Stephen and Sean,
 
  Thanks for correction.
 
  On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote:
 
   No, the Maven build is the main one.  I would use it unless you have a
   need to use the SBT build in particular.
   On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody 
   dineshjweerakk...@gmail.com wrote:
  
   Hi Yiming,
  
   I believe that both SBT and MVN is supported in SPARK, but SBT is
   preferred
   (I'm not 100% sure about this :) ). When I'm using MVN I got some
 build
   failures. After that used SBT and works fine.
  
   You can go through these discussions regarding SBT vs MVN and learn
 pros
   and cons of both [1] [2].
  
   [1]
  
  
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
  
   [2]
  
  
 
 https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
  
   Thanks,
  
   On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang 
 sdi...@gmail.com
   wrote:
  
Hi,
   
   
   
I am new in developing Spark and my current focus is about
   co-scheduling of
spark tasks. However, I am confused with the building tools:
 sometimes
   the
documentation uses mvn but sometimes uses sbt.
   
   
   
So, my question is that which one is the preferred tool of Spark
   community?
And what's the technical difference between them? Thank you!
   
   
   
Cheers,
   
Yiming
   
   
  
  
   --
   Thanks  Best Regards,
  
   *Dinesh J. Weerakkody*
  
  
 
 
  --
  Thanks  Best Regards,
 
  *Dinesh J. Weerakkody*
 



Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Mark Hamstra

 The console mode of sbt (just run
 sbt/sbt and then a long running console session is started that will accept
 further commands) is great for building individual subprojects or running
 single test suites.  In addition to being faster since its a long running
 JVM, its got a lot of nice features like tab-completion for test case
 names.


We include the scala-maven-plugin in spark/pom.xml, so equivalent
functionality is available using Maven.  You can start a console session
with `mvn scala:console`.


On Sun, Nov 16, 2014 at 1:23 PM, Michael Armbrust mich...@databricks.com
wrote:

 I'm going to have to disagree here.  If you are building a release
 distribution or integrating with legacy systems then maven is probably the
 correct choice.  However most of the core developers that I know use sbt,
 and I think its a better choice for exploration and development overall.
 That said, this probably falls into the category of a religious argument so
 you might want to look at both options and decide for yourself.

 In my experience the SBT build is significantly faster with less effort
 (and I think sbt is still faster even if you go through the extra effort of
 installing zinc) and easier to read.  The console mode of sbt (just run
 sbt/sbt and then a long running console session is started that will accept
 further commands) is great for building individual subprojects or running
 single test suites.  In addition to being faster since its a long running
 JVM, its got a lot of nice features like tab-completion for test case
 names.

 For example, if I wanted to see what test cases are available in the SQL
 subproject you can do the following:

 [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
 [info] Loading project definition from
 /Users/marmbrus/workspace/spark/project/project
 [info] Loading project definition from

 /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
 [info] Set current project to spark-parent (in build
 file:/Users/marmbrus/workspace/spark/)
  sql/test-only *tab*
 --
  org.apache.spark.sql.CachedTableSuite
 org.apache.spark.sql.DataTypeSuite
  org.apache.spark.sql.DslQuerySuite
 org.apache.spark.sql.InsertIntoSuite
 ...

 Another very useful feature is the development console, which starts an
 interactive REPL including the most recent version of the code and a lot of
 useful imports for some subprojects.  For example in the hive subproject it
 automatically sets up a temporary database with a bunch of test data
 pre-loaded:

 $ sbt/sbt hive/console
  hive/console
 ...
 import org.apache.spark.sql.hive._
 import org.apache.spark.sql.hive.test.TestHive._
 import org.apache.spark.sql.parquet.ParquetTestData
 Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.7.0_45).
 Type in expressions to have them evaluated.
 Type :help for more information.

 scala sql(SELECT * FROM src).take(2)
 res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])

 Michael

 On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody 
 dineshjweerakk...@gmail.com wrote:

  Hi Stephen and Sean,
 
  Thanks for correction.
 
  On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote:
 
   No, the Maven build is the main one.  I would use it unless you have a
   need to use the SBT build in particular.
   On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody 
   dineshjweerakk...@gmail.com wrote:
  
   Hi Yiming,
  
   I believe that both SBT and MVN is supported in SPARK, but SBT is
   preferred
   (I'm not 100% sure about this :) ). When I'm using MVN I got some
 build
   failures. After that used SBT and works fine.
  
   You can go through these discussions regarding SBT vs MVN and learn
 pros
   and cons of both [1] [2].
  
   [1]
  
  
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
  
   [2]
  
  
 
 https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
  
   Thanks,
  
   On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang 
 sdi...@gmail.com
   wrote:
  
Hi,
   
   
   
I am new in developing Spark and my current focus is about
   co-scheduling of
spark tasks. However, I am confused with the building tools:
 sometimes
   the
documentation uses mvn but sometimes uses sbt.
   
   
   
So, my question is that which one is the preferred tool of Spark
   community?
And what's the technical difference between them? Thank you!
   
   
   
Cheers,
   
Yiming
   
   
  
  
   --
   Thanks  Best Regards,
  
   *Dinesh J. Weerakkody*
  
  
 
 
  --
  Thanks  Best Regards,
 
  *Dinesh J. Weerakkody*
 



Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Patrick Wendell
Neither is strictly optimal which is why we ended up supporting both.
Our reference build for packaging is Maven so you are less likely to
run into unexpected dependency issues, etc. Many developers use sbt as
well. It's somewhat religion and the best thing might be to try both
and see which you prefer.

- Patrick

On Sun, Nov 16, 2014 at 1:47 PM, Mark Hamstra m...@clearstorydata.com wrote:

 The console mode of sbt (just run
 sbt/sbt and then a long running console session is started that will accept
 further commands) is great for building individual subprojects or running
 single test suites.  In addition to being faster since its a long running
 JVM, its got a lot of nice features like tab-completion for test case
 names.


 We include the scala-maven-plugin in spark/pom.xml, so equivalent
 functionality is available using Maven.  You can start a console session
 with `mvn scala:console`.


 On Sun, Nov 16, 2014 at 1:23 PM, Michael Armbrust mich...@databricks.com
 wrote:

 I'm going to have to disagree here.  If you are building a release
 distribution or integrating with legacy systems then maven is probably the
 correct choice.  However most of the core developers that I know use sbt,
 and I think its a better choice for exploration and development overall.
 That said, this probably falls into the category of a religious argument so
 you might want to look at both options and decide for yourself.

 In my experience the SBT build is significantly faster with less effort
 (and I think sbt is still faster even if you go through the extra effort of
 installing zinc) and easier to read.  The console mode of sbt (just run
 sbt/sbt and then a long running console session is started that will accept
 further commands) is great for building individual subprojects or running
 single test suites.  In addition to being faster since its a long running
 JVM, its got a lot of nice features like tab-completion for test case
 names.

 For example, if I wanted to see what test cases are available in the SQL
 subproject you can do the following:

 [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
 [info] Loading project definition from
 /Users/marmbrus/workspace/spark/project/project
 [info] Loading project definition from

 /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
 [info] Set current project to spark-parent (in build
 file:/Users/marmbrus/workspace/spark/)
  sql/test-only *tab*
 --
  org.apache.spark.sql.CachedTableSuite
 org.apache.spark.sql.DataTypeSuite
  org.apache.spark.sql.DslQuerySuite
 org.apache.spark.sql.InsertIntoSuite
 ...

 Another very useful feature is the development console, which starts an
 interactive REPL including the most recent version of the code and a lot of
 useful imports for some subprojects.  For example in the hive subproject it
 automatically sets up a temporary database with a bunch of test data
 pre-loaded:

 $ sbt/sbt hive/console
  hive/console
 ...
 import org.apache.spark.sql.hive._
 import org.apache.spark.sql.hive.test.TestHive._
 import org.apache.spark.sql.parquet.ParquetTestData
 Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.7.0_45).
 Type in expressions to have them evaluated.
 Type :help for more information.

 scala sql(SELECT * FROM src).take(2)
 res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])

 Michael

 On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody 
 dineshjweerakk...@gmail.com wrote:

  Hi Stephen and Sean,
 
  Thanks for correction.
 
  On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote:
 
   No, the Maven build is the main one.  I would use it unless you have a
   need to use the SBT build in particular.
   On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody 
   dineshjweerakk...@gmail.com wrote:
  
   Hi Yiming,
  
   I believe that both SBT and MVN is supported in SPARK, but SBT is
   preferred
   (I'm not 100% sure about this :) ). When I'm using MVN I got some
 build
   failures. After that used SBT and works fine.
  
   You can go through these discussions regarding SBT vs MVN and learn
 pros
   and cons of both [1] [2].
  
   [1]
  
  
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
  
   [2]
  
  
 
 https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
  
   Thanks,
  
   On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang 
 sdi...@gmail.com
   wrote:
  
Hi,
   
   
   
I am new in developing Spark and my current focus is about
   co-scheduling of
spark tasks. However, I am confused with the building tools:
 sometimes
   the
documentation uses mvn but sometimes uses sbt.
   
   
   
So, my question is that which one is the preferred tool of Spark
   community?
And what's the technical difference between them? Thank you!
   
   
   
Cheers,
   
Yiming
   
   
  
  
   --
   Thanks  Best Regards,
  
   *Dinesh J. 

Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Mark Hamstra
Ok, strictly speaking, that's equivalent to your second class of
examples, development
console, not the first sbt console

On Sun, Nov 16, 2014 at 1:47 PM, Mark Hamstra m...@clearstorydata.com
wrote:

 The console mode of sbt (just run
 sbt/sbt and then a long running console session is started that will
 accept
 further commands) is great for building individual subprojects or running
 single test suites.  In addition to being faster since its a long running
 JVM, its got a lot of nice features like tab-completion for test case
 names.


 We include the scala-maven-plugin in spark/pom.xml, so equivalent
 functionality is available using Maven.  You can start a console session
 with `mvn scala:console`.


 On Sun, Nov 16, 2014 at 1:23 PM, Michael Armbrust mich...@databricks.com
 wrote:

 I'm going to have to disagree here.  If you are building a release
 distribution or integrating with legacy systems then maven is probably the
 correct choice.  However most of the core developers that I know use sbt,
 and I think its a better choice for exploration and development overall.
 That said, this probably falls into the category of a religious argument
 so
 you might want to look at both options and decide for yourself.

 In my experience the SBT build is significantly faster with less effort
 (and I think sbt is still faster even if you go through the extra effort
 of
 installing zinc) and easier to read.  The console mode of sbt (just run
 sbt/sbt and then a long running console session is started that will
 accept
 further commands) is great for building individual subprojects or running
 single test suites.  In addition to being faster since its a long running
 JVM, its got a lot of nice features like tab-completion for test case
 names.

 For example, if I wanted to see what test cases are available in the SQL
 subproject you can do the following:

 [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
 [info] Loading project definition from
 /Users/marmbrus/workspace/spark/project/project
 [info] Loading project definition from

 /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
 [info] Set current project to spark-parent (in build
 file:/Users/marmbrus/workspace/spark/)
  sql/test-only *tab*
 --
  org.apache.spark.sql.CachedTableSuite
 org.apache.spark.sql.DataTypeSuite
  org.apache.spark.sql.DslQuerySuite
 org.apache.spark.sql.InsertIntoSuite
 ...

 Another very useful feature is the development console, which starts an
 interactive REPL including the most recent version of the code and a lot
 of
 useful imports for some subprojects.  For example in the hive subproject
 it
 automatically sets up a temporary database with a bunch of test data
 pre-loaded:

 $ sbt/sbt hive/console
  hive/console
 ...
 import org.apache.spark.sql.hive._
 import org.apache.spark.sql.hive.test.TestHive._
 import org.apache.spark.sql.parquet.ParquetTestData
 Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.7.0_45).
 Type in expressions to have them evaluated.
 Type :help for more information.

 scala sql(SELECT * FROM src).take(2)
 res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])

 Michael

 On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody 
 dineshjweerakk...@gmail.com wrote:

  Hi Stephen and Sean,
 
  Thanks for correction.
 
  On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote:
 
   No, the Maven build is the main one.  I would use it unless you have a
   need to use the SBT build in particular.
   On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody 
   dineshjweerakk...@gmail.com wrote:
  
   Hi Yiming,
  
   I believe that both SBT and MVN is supported in SPARK, but SBT is
   preferred
   (I'm not 100% sure about this :) ). When I'm using MVN I got some
 build
   failures. After that used SBT and works fine.
  
   You can go through these discussions regarding SBT vs MVN and learn
 pros
   and cons of both [1] [2].
  
   [1]
  
  
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
  
   [2]
  
  
 
 https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
  
   Thanks,
  
   On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang 
 sdi...@gmail.com
   wrote:
  
Hi,
   
   
   
I am new in developing Spark and my current focus is about
   co-scheduling of
spark tasks. However, I am confused with the building tools:
 sometimes
   the
documentation uses mvn but sometimes uses sbt.
   
   
   
So, my question is that which one is the preferred tool of Spark
   community?
And what's the technical difference between them? Thank you!
   
   
   
Cheers,
   
Yiming
   
   
  
  
   --
   Thanks  Best Regards,
  
   *Dinesh J. Weerakkody*
  
  
 
 
  --
  Thanks  Best Regards,
 
  *Dinesh J. Weerakkody*
 





re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Yiming (John) Zhang
Hi Dinesh, Sean, Michael, Stephen, Mark, and Patrick

 

Thank you for your reply and discussions. So the conclusion is that mvn is 
preferred when packaging and distribution, while sbt is better for development. 
This also explains why the compilation tool of make-distribution.sh changed 
from sbt (in spark-0.9) to mvn(in spark-1.0).

 

Cheers,

Yiming

 

发件人: Dinesh J. Weerakkody [mailto:dineshjweerakk...@gmail.com] 
发送时间: 2014年11月16日 10:58
收件人: sdi...@gmail.com
抄送: dev@spark.apache.org
主题: Re: mvn or sbt for studying and developing Spark?

 

Hi Yiming,

I believe that both SBT and MVN is supported in SPARK, but SBT is preferred 
(I'm not 100% sure about this :) ). When I'm using MVN I got some build 
failures. After that used SBT and works fine.

You can go through these discussions regarding SBT vs MVN and learn pros and 
cons of both [1] [2].

[1] 
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html

[2] 
https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ

 

Thanks,

 

On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com 
mailto:sdi...@gmail.com  wrote:

Hi,



I am new in developing Spark and my current focus is about co-scheduling of
spark tasks. However, I am confused with the building tools: sometimes the
documentation uses mvn but sometimes uses sbt.



So, my question is that which one is the preferred tool of Spark community?
And what's the technical difference between them? Thank you!



Cheers,

Yiming




-- 

Thanks  Best Regards,

Dinesh J. Weerakkody



Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Mark Hamstra
More or less correct, but I'd add that there are an awful lot of software
systems out there that use Maven.  Integrating with those systems is
generally easier if you are also working with Spark in Maven.  (And I
wouldn't classify all of those Maven-built systems as legacy, Michael :)
 What that ends up meaning is that if you are working *on* Spark, then SBT
can be more convenient and productive; but if you are working *with* Spark
along with other significant pieces of software, then using Maven can be
the better approach.

On Sun, Nov 16, 2014 at 6:11 PM, Yiming (John) Zhang sdi...@gmail.com
wrote:

 Hi Dinesh, Sean, Michael, Stephen, Mark, and Patrick



 Thank you for your reply and discussions. So the conclusion is that mvn is
 preferred when packaging and distribution, while sbt is better for
 development. This also explains why the compilation tool of
 make-distribution.sh changed from sbt (in spark-0.9) to mvn(in spark-1.0).



 Cheers,

 Yiming



 发件人: Dinesh J. Weerakkody [mailto:dineshjweerakk...@gmail.com]
 发送时间: 2014年11月16日 10:58
 收件人: sdi...@gmail.com
 抄送: dev@spark.apache.org
 主题: Re: mvn or sbt for studying and developing Spark?



 Hi Yiming,

 I believe that both SBT and MVN is supported in SPARK, but SBT is
 preferred (I'm not 100% sure about this :) ). When I'm using MVN I got some
 build failures. After that used SBT and works fine.

 You can go through these discussions regarding SBT vs MVN and learn pros
 and cons of both [1] [2].

 [1]
 http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html

 [2]
 https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ



 Thanks,



 On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com
 mailto:sdi...@gmail.com  wrote:

 Hi,



 I am new in developing Spark and my current focus is about co-scheduling of
 spark tasks. However, I am confused with the building tools: sometimes the
 documentation uses mvn but sometimes uses sbt.



 So, my question is that which one is the preferred tool of Spark community?
 And what's the technical difference between them? Thank you!



 Cheers,

 Yiming




 --

 Thanks  Best Regards,

 Dinesh J. Weerakkody




mvn or sbt for studying and developing Spark?

2014-11-15 Thread Yiming (John) Zhang
Hi,

 

I am new in developing Spark and my current focus is about co-scheduling of
spark tasks. However, I am confused with the building tools: sometimes the
documentation uses mvn but sometimes uses sbt. 

 

So, my question is that which one is the preferred tool of Spark community?
And what's the technical difference between them? Thank you!

 

Cheers,

Yiming



Re: mvn or sbt for studying and developing Spark?

2014-11-15 Thread Dinesh J. Weerakkody
Hi Yiming,

I believe that both SBT and MVN is supported in SPARK, but SBT is preferred
(I'm not 100% sure about this :) ). When I'm using MVN I got some build
failures. After that used SBT and works fine.

You can go through these discussions regarding SBT vs MVN and learn pros
and cons of both [1] [2].

[1]
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html

[2]
https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ

Thanks,

On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com
wrote:

 Hi,



 I am new in developing Spark and my current focus is about co-scheduling of
 spark tasks. However, I am confused with the building tools: sometimes the
 documentation uses mvn but sometimes uses sbt.



 So, my question is that which one is the preferred tool of Spark community?
 And what's the technical difference between them? Thank you!



 Cheers,

 Yiming




-- 
Thanks  Best Regards,

*Dinesh J. Weerakkody*


Re: mvn or sbt for studying and developing Spark?

2014-11-15 Thread Sean Owen
No, the Maven build is the main one.  I would use it unless you have a need
to use the SBT build in particular.
On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com
wrote:

 Hi Yiming,

 I believe that both SBT and MVN is supported in SPARK, but SBT is preferred
 (I'm not 100% sure about this :) ). When I'm using MVN I got some build
 failures. After that used SBT and works fine.

 You can go through these discussions regarding SBT vs MVN and learn pros
 and cons of both [1] [2].

 [1]

 http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html

 [2]

 https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ

 Thanks,

 On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang sdi...@gmail.com
 wrote:

  Hi,
 
 
 
  I am new in developing Spark and my current focus is about co-scheduling
 of
  spark tasks. However, I am confused with the building tools: sometimes
 the
  documentation uses mvn but sometimes uses sbt.
 
 
 
  So, my question is that which one is the preferred tool of Spark
 community?
  And what's the technical difference between them? Thank you!
 
 
 
  Cheers,
 
  Yiming
 
 


 --
 Thanks  Best Regards,

 *Dinesh J. Weerakkody*