Re: State of the Build

2015-11-06 Thread Michael Armbrust
Its not included, it is downloaded on demand.

That said I think the fact that we can download the jar is a huge feature
of SBT, no installation needed, build the project as long as you have a JVM.

On Fri, Nov 6, 2015 at 4:49 PM, Jakob Odersky  wrote:

> > Can you clarify which sbt jar (by path) ?
> Any of them.
> Sbt is a build tool, and I don't understand why it is included in a source
> repository. It would be like including make in a project.
>
> On 6 November 2015 at 16:43, Ted Yu  wrote:
>
>> bq. include an sbt jar in the source repo
>>
>> Can you clarify which sbt jar (by path) ?
>>
>> I tried 'git log' on the following files but didn't see commit history:
>>
>> ./build/sbt-launch-0.13.7.jar
>> ./build/zinc-0.3.5.3/lib/sbt-interface.jar
>> ./sbt/sbt-launch-0.13.2.jar
>> ./sbt/sbt-launch-0.13.5.jar
>>
>> On Fri, Nov 6, 2015 at 4:25 PM, Jakob Odersky  wrote:
>>
>>> [Reposting to the list again, I really should double-check that
>>> reply-to-all button]
>>>
>>> in the mean-time, as a light Friday-afternoon patch I was thinking about
>>> splitting the ~600loc-single-build sbt file into something more manageable
>>> like the Akka build (without changing any dependencies or settings). I know
>>> its pretty trivial and not very important, but it might make things easier
>>> to add/refactor in the future.
>>>
>>> Also, why do we include an sbt jar in the source repo, especially if it
>>> is used only as an internal development tool?
>>>
>>> On 6 November 2015 at 15:29, Patrick Wendell  wrote:
>>>
 I think there are a few minor differences in the dependency graph that
 arise from this. For a given user, the probability it affects them is low -
 it needs to conflict with a library a user application is using. However
 the probability it affects *some users* is very high and we do see small
 changes crop up fairly frequently.

 My feeling is mostly pragmatic... if we can get things working to
 standardize on Maven-style resolution by upgrading SBT, let's do it. If
 that's not tenable, we can evaluate alternatives.

 - Patrick

 On Fri, Nov 6, 2015 at 3:07 PM, Marcelo Vanzin 
 wrote:

> On Fri, Nov 6, 2015 at 3:04 PM, Koert Kuipers 
> wrote:
> > if i understand it correctly it would cause compatibility breaks for
> > applications on top of spark, because those applications use the
> exact same
> > current resolution logic (so basically they are maven apps), and the
> change
> > would make them inconsistent?
>
> I think Patrick said it could cause compatibility breaks because
> switching to sbt's version resolution means Spark's dependency tree
> would change. Just to cite the recent example, you'd get Guava 16
> instead of 14 (let's ignore that Guava is currently mostly shaded in
> Spark), so if your app depended transitively on Guava and used APIs
> from 14 that are not on 16, it would break.
>
> --
> Marcelo
>


>>>
>>
>


Re: State of the Build

2015-11-06 Thread Jakob Odersky
> Can you clarify which sbt jar (by path) ?
Any of them.
Sbt is a build tool, and I don't understand why it is included in a source
repository. It would be like including make in a project.

On 6 November 2015 at 16:43, Ted Yu  wrote:

> bq. include an sbt jar in the source repo
>
> Can you clarify which sbt jar (by path) ?
>
> I tried 'git log' on the following files but didn't see commit history:
>
> ./build/sbt-launch-0.13.7.jar
> ./build/zinc-0.3.5.3/lib/sbt-interface.jar
> ./sbt/sbt-launch-0.13.2.jar
> ./sbt/sbt-launch-0.13.5.jar
>
> On Fri, Nov 6, 2015 at 4:25 PM, Jakob Odersky  wrote:
>
>> [Reposting to the list again, I really should double-check that
>> reply-to-all button]
>>
>> in the mean-time, as a light Friday-afternoon patch I was thinking about
>> splitting the ~600loc-single-build sbt file into something more manageable
>> like the Akka build (without changing any dependencies or settings). I know
>> its pretty trivial and not very important, but it might make things easier
>> to add/refactor in the future.
>>
>> Also, why do we include an sbt jar in the source repo, especially if it
>> is used only as an internal development tool?
>>
>> On 6 November 2015 at 15:29, Patrick Wendell  wrote:
>>
>>> I think there are a few minor differences in the dependency graph that
>>> arise from this. For a given user, the probability it affects them is low -
>>> it needs to conflict with a library a user application is using. However
>>> the probability it affects *some users* is very high and we do see small
>>> changes crop up fairly frequently.
>>>
>>> My feeling is mostly pragmatic... if we can get things working to
>>> standardize on Maven-style resolution by upgrading SBT, let's do it. If
>>> that's not tenable, we can evaluate alternatives.
>>>
>>> - Patrick
>>>
>>> On Fri, Nov 6, 2015 at 3:07 PM, Marcelo Vanzin 
>>> wrote:
>>>
 On Fri, Nov 6, 2015 at 3:04 PM, Koert Kuipers 
 wrote:
 > if i understand it correctly it would cause compatibility breaks for
 > applications on top of spark, because those applications use the
 exact same
 > current resolution logic (so basically they are maven apps), and the
 change
 > would make them inconsistent?

 I think Patrick said it could cause compatibility breaks because
 switching to sbt's version resolution means Spark's dependency tree
 would change. Just to cite the recent example, you'd get Guava 16
 instead of 14 (let's ignore that Guava is currently mostly shaded in
 Spark), so if your app depended transitively on Guava and used APIs
 from 14 that are not on 16, it would break.

 --
 Marcelo

>>>
>>>
>>
>


Re: State of the Build

2015-11-06 Thread Ted Yu
bq. include an sbt jar in the source repo

Can you clarify which sbt jar (by path) ?

I tried 'git log' on the following files but didn't see commit history:

./build/sbt-launch-0.13.7.jar
./build/zinc-0.3.5.3/lib/sbt-interface.jar
./sbt/sbt-launch-0.13.2.jar
./sbt/sbt-launch-0.13.5.jar

On Fri, Nov 6, 2015 at 4:25 PM, Jakob Odersky  wrote:

> [Reposting to the list again, I really should double-check that
> reply-to-all button]
>
> in the mean-time, as a light Friday-afternoon patch I was thinking about
> splitting the ~600loc-single-build sbt file into something more manageable
> like the Akka build (without changing any dependencies or settings). I know
> its pretty trivial and not very important, but it might make things easier
> to add/refactor in the future.
>
> Also, why do we include an sbt jar in the source repo, especially if it is
> used only as an internal development tool?
>
> On 6 November 2015 at 15:29, Patrick Wendell  wrote:
>
>> I think there are a few minor differences in the dependency graph that
>> arise from this. For a given user, the probability it affects them is low -
>> it needs to conflict with a library a user application is using. However
>> the probability it affects *some users* is very high and we do see small
>> changes crop up fairly frequently.
>>
>> My feeling is mostly pragmatic... if we can get things working to
>> standardize on Maven-style resolution by upgrading SBT, let's do it. If
>> that's not tenable, we can evaluate alternatives.
>>
>> - Patrick
>>
>> On Fri, Nov 6, 2015 at 3:07 PM, Marcelo Vanzin 
>> wrote:
>>
>>> On Fri, Nov 6, 2015 at 3:04 PM, Koert Kuipers  wrote:
>>> > if i understand it correctly it would cause compatibility breaks for
>>> > applications on top of spark, because those applications use the exact
>>> same
>>> > current resolution logic (so basically they are maven apps), and the
>>> change
>>> > would make them inconsistent?
>>>
>>> I think Patrick said it could cause compatibility breaks because
>>> switching to sbt's version resolution means Spark's dependency tree
>>> would change. Just to cite the recent example, you'd get Guava 16
>>> instead of 14 (let's ignore that Guava is currently mostly shaded in
>>> Spark), so if your app depended transitively on Guava and used APIs
>>> from 14 that are not on 16, it would break.
>>>
>>> --
>>> Marcelo
>>>
>>
>>
>


Re: State of the Build

2015-11-06 Thread Jakob Odersky
[Reposting to the list again, I really should double-check that
reply-to-all button]

in the mean-time, as a light Friday-afternoon patch I was thinking about
splitting the ~600loc-single-build sbt file into something more manageable
like the Akka build (without changing any dependencies or settings). I know
its pretty trivial and not very important, but it might make things easier
to add/refactor in the future.

Also, why do we include an sbt jar in the source repo, especially if it is
used only as an internal development tool?

On 6 November 2015 at 15:29, Patrick Wendell  wrote:

> I think there are a few minor differences in the dependency graph that
> arise from this. For a given user, the probability it affects them is low -
> it needs to conflict with a library a user application is using. However
> the probability it affects *some users* is very high and we do see small
> changes crop up fairly frequently.
>
> My feeling is mostly pragmatic... if we can get things working to
> standardize on Maven-style resolution by upgrading SBT, let's do it. If
> that's not tenable, we can evaluate alternatives.
>
> - Patrick
>
> On Fri, Nov 6, 2015 at 3:07 PM, Marcelo Vanzin 
> wrote:
>
>> On Fri, Nov 6, 2015 at 3:04 PM, Koert Kuipers  wrote:
>> > if i understand it correctly it would cause compatibility breaks for
>> > applications on top of spark, because those applications use the exact
>> same
>> > current resolution logic (so basically they are maven apps), and the
>> change
>> > would make them inconsistent?
>>
>> I think Patrick said it could cause compatibility breaks because
>> switching to sbt's version resolution means Spark's dependency tree
>> would change. Just to cite the recent example, you'd get Guava 16
>> instead of 14 (let's ignore that Guava is currently mostly shaded in
>> Spark), so if your app depended transitively on Guava and used APIs
>> from 14 that are not on 16, it would break.
>>
>> --
>> Marcelo
>>
>
>


Re: State of the Build

2015-11-06 Thread Koert Kuipers
oh ok i think i got it... i hope! since the app runs with the spark
assembly jar on its classpath, the exact version as resolved by spark's
build process is actually on the apps classpath.

sorry didnt mean the pollute this thread with my own dependency resolution
confusion.


On Fri, Nov 6, 2015 at 6:07 PM, Marcelo Vanzin  wrote:

> On Fri, Nov 6, 2015 at 3:04 PM, Koert Kuipers  wrote:
> > if i understand it correctly it would cause compatibility breaks for
> > applications on top of spark, because those applications use the exact
> same
> > current resolution logic (so basically they are maven apps), and the
> change
> > would make them inconsistent?
>
> I think Patrick said it could cause compatibility breaks because
> switching to sbt's version resolution means Spark's dependency tree
> would change. Just to cite the recent example, you'd get Guava 16
> instead of 14 (let's ignore that Guava is currently mostly shaded in
> Spark), so if your app depended transitively on Guava and used APIs
> from 14 that are not on 16, it would break.
>
> --
> Marcelo
>


Re: State of the Build

2015-11-06 Thread Patrick Wendell
I think there are a few minor differences in the dependency graph that
arise from this. For a given user, the probability it affects them is low -
it needs to conflict with a library a user application is using. However
the probability it affects *some users* is very high and we do see small
changes crop up fairly frequently.

My feeling is mostly pragmatic... if we can get things working to
standardize on Maven-style resolution by upgrading SBT, let's do it. If
that's not tenable, we can evaluate alternatives.

- Patrick

On Fri, Nov 6, 2015 at 3:07 PM, Marcelo Vanzin  wrote:

> On Fri, Nov 6, 2015 at 3:04 PM, Koert Kuipers  wrote:
> > if i understand it correctly it would cause compatibility breaks for
> > applications on top of spark, because those applications use the exact
> same
> > current resolution logic (so basically they are maven apps), and the
> change
> > would make them inconsistent?
>
> I think Patrick said it could cause compatibility breaks because
> switching to sbt's version resolution means Spark's dependency tree
> would change. Just to cite the recent example, you'd get Guava 16
> instead of 14 (let's ignore that Guava is currently mostly shaded in
> Spark), so if your app depended transitively on Guava and used APIs
> from 14 that are not on 16, it would break.
>
> --
> Marcelo
>


Re: State of the Build

2015-11-06 Thread Marcelo Vanzin
On Fri, Nov 6, 2015 at 3:04 PM, Koert Kuipers  wrote:
> if i understand it correctly it would cause compatibility breaks for
> applications on top of spark, because those applications use the exact same
> current resolution logic (so basically they are maven apps), and the change
> would make them inconsistent?

I think Patrick said it could cause compatibility breaks because
switching to sbt's version resolution means Spark's dependency tree
would change. Just to cite the recent example, you'd get Guava 16
instead of 14 (let's ignore that Guava is currently mostly shaded in
Spark), so if your app depended transitively on Guava and used APIs
from 14 that are not on 16, it would break.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: State of the Build

2015-11-06 Thread Koert Kuipers
thats interesting...

if i understand it correctly it would cause compatibility breaks for
applications on top of spark, because those applications use the exact same
current resolution logic (so basically they are maven apps), and the change
would make them inconsistent?

by that logic all existing applications on top of spark that do not use
maven are already in danger of incompatibly breaks?
or am i completely misunderstanding?

this makes the implications of spark switching to maven a while back
somewhat more serious than i realized.

on the other hand we use sbt for our spark apps and this has never been an
issue for us, so i am not sure how real/serious this compatibility issue is.





, assuming those applications

because those applications are maven applications that currently use the
same exact logic,


because you publish a pom

because those applications build on top of spark currently assume

On Fri, Nov 6, 2015 at 5:35 PM, Patrick Wendell  wrote:

> I think we'd have to standardize on Maven-style resolution, or I'd at
> least like to see that path explored first. The issue is if we switch the
> standard now, it could cause compatibility breaks for applications on top
> of Spark.
>
> On Fri, Nov 6, 2015 at 2:28 PM, Jakob Odersky  wrote:
>
>> Reposting to the list...
>>
>> Thanks for all the feedback everyone, I get a clearer picture of the
>> reasoning and implications now.
>>
>> Koert, according to your post in this thread
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Master-build-fails-tt14895.html#a15023,
>> it is apparently very easy to change the maven resolution mechanism to the
>> ivy one.
>> Patrick, would this not help with the problems you described?
>>
>> On 5 November 2015 at 23:23, Patrick Wendell  wrote:
>>
>>> Hey Jakob,
>>>
>>> The builds in Spark are largely maintained by me, Sean, and Michael
>>> Armbrust (for SBT). For historical reasons, Spark supports both a Maven and
>>> SBT build. Maven is the build of reference for packaging Spark and is used
>>> by many downstream packagers and to build all Spark releases. SBT is more
>>> often used by developers. Both builds inherit from the same pom files (and
>>> rely on the same profiles) to minimize maintenance complexity of Spark's
>>> very complex dependency graph.
>>>
>>> If you are looking to make contributions that help with the build, I am
>>> happy to point you towards some things that are consistent maintenance
>>> headaches. There are two major pain points right now that I'd be thrilled
>>> to see fixes for:
>>>
>>> 1. SBT relies on a different dependency conflict resolution strategy
>>> than maven - causing all kinds of headaches for us. I have heard that newer
>>> versions of SBT can (maybe?) use Maven as a dependency resolver instead of
>>> Ivy. This would make our life so much better if it were possible, either by
>>> virtue of upgrading SBT or somehow doing this ourselves.
>>>
>>> 2. We don't have a great way of auditing the net effect of dependency
>>> changes when people make them in the build. I am working on a fairly clunky
>>> patch to do this here:
>>>
>>> https://github.com/apache/spark/pull/8531
>>>
>>> It could be done much more nicely using SBT, but only provided (1) is
>>> solved.
>>>
>>> Doing a major overhaul of the sbt build to decouple it from pom files,
>>> I'm not sure that's the best place to start, given that we need to continue
>>> to support maven - the coupling is intentional. But getting involved in the
>>> build in general would be completely welcome.
>>>
>>> - Patrick
>>>
>>> On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen  wrote:
>>>
 Maven isn't 'legacy', or supported for the benefit of third parties.
 SBT had some behaviors / problems that Maven didn't relative to what
 Spark needs. SBT is a development-time alternative only, and partly
 generated from the Maven build.

 On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers 
 wrote:
 > People who do upstream builds of spark (think bigtop and hadoop
 distros) are
 > used to legacy systems like maven, so maven is the default build. I
 don't
 > think it will change.
 >
 > Any improvements for the sbt build are of course welcome (it is still
 used
 > by many developers), but i would not do anything that increases the
 burden
 > of maintaining two build systems.
 >
 > On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:
 >>
 >> Hi everyone,
 >> in the process of learning Spark, I wanted to get an overview of the
 >> interaction between all of its sub-projects. I therefore decided to
 have a
 >> look at the build setup and its dependency management.
 >> Since I am alot more comfortable using sbt than maven, I decided to
 try to
 >> port the maven configuration to sbt (with the help of automated
 tools).
 >> This led me to a couple of observations and questions on the build
 system
 >> design:
 >>
 >> First, curr

Re: State of the Build

2015-11-06 Thread Patrick Wendell
I think we'd have to standardize on Maven-style resolution, or I'd at least
like to see that path explored first. The issue is if we switch the
standard now, it could cause compatibility breaks for applications on top
of Spark.

On Fri, Nov 6, 2015 at 2:28 PM, Jakob Odersky  wrote:

> Reposting to the list...
>
> Thanks for all the feedback everyone, I get a clearer picture of the
> reasoning and implications now.
>
> Koert, according to your post in this thread
> http://apache-spark-developers-list.1001551.n3.nabble.com/Master-build-fails-tt14895.html#a15023,
> it is apparently very easy to change the maven resolution mechanism to the
> ivy one.
> Patrick, would this not help with the problems you described?
>
> On 5 November 2015 at 23:23, Patrick Wendell  wrote:
>
>> Hey Jakob,
>>
>> The builds in Spark are largely maintained by me, Sean, and Michael
>> Armbrust (for SBT). For historical reasons, Spark supports both a Maven and
>> SBT build. Maven is the build of reference for packaging Spark and is used
>> by many downstream packagers and to build all Spark releases. SBT is more
>> often used by developers. Both builds inherit from the same pom files (and
>> rely on the same profiles) to minimize maintenance complexity of Spark's
>> very complex dependency graph.
>>
>> If you are looking to make contributions that help with the build, I am
>> happy to point you towards some things that are consistent maintenance
>> headaches. There are two major pain points right now that I'd be thrilled
>> to see fixes for:
>>
>> 1. SBT relies on a different dependency conflict resolution strategy than
>> maven - causing all kinds of headaches for us. I have heard that newer
>> versions of SBT can (maybe?) use Maven as a dependency resolver instead of
>> Ivy. This would make our life so much better if it were possible, either by
>> virtue of upgrading SBT or somehow doing this ourselves.
>>
>> 2. We don't have a great way of auditing the net effect of dependency
>> changes when people make them in the build. I am working on a fairly clunky
>> patch to do this here:
>>
>> https://github.com/apache/spark/pull/8531
>>
>> It could be done much more nicely using SBT, but only provided (1) is
>> solved.
>>
>> Doing a major overhaul of the sbt build to decouple it from pom files,
>> I'm not sure that's the best place to start, given that we need to continue
>> to support maven - the coupling is intentional. But getting involved in the
>> build in general would be completely welcome.
>>
>> - Patrick
>>
>> On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen  wrote:
>>
>>> Maven isn't 'legacy', or supported for the benefit of third parties.
>>> SBT had some behaviors / problems that Maven didn't relative to what
>>> Spark needs. SBT is a development-time alternative only, and partly
>>> generated from the Maven build.
>>>
>>> On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers  wrote:
>>> > People who do upstream builds of spark (think bigtop and hadoop
>>> distros) are
>>> > used to legacy systems like maven, so maven is the default build. I
>>> don't
>>> > think it will change.
>>> >
>>> > Any improvements for the sbt build are of course welcome (it is still
>>> used
>>> > by many developers), but i would not do anything that increases the
>>> burden
>>> > of maintaining two build systems.
>>> >
>>> > On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:
>>> >>
>>> >> Hi everyone,
>>> >> in the process of learning Spark, I wanted to get an overview of the
>>> >> interaction between all of its sub-projects. I therefore decided to
>>> have a
>>> >> look at the build setup and its dependency management.
>>> >> Since I am alot more comfortable using sbt than maven, I decided to
>>> try to
>>> >> port the maven configuration to sbt (with the help of automated
>>> tools).
>>> >> This led me to a couple of observations and questions on the build
>>> system
>>> >> design:
>>> >>
>>> >> First, currently, there are two build systems, maven and sbt. Is
>>> there a
>>> >> preferred tool (or future direction to one)?
>>> >>
>>> >> Second, the sbt build also uses maven "profiles" requiring the use of
>>> >> specific commandline parameters when starting sbt. Furthermore, since
>>> it
>>> >> relies on maven poms, dependencies to the scala binary version
>>> (_2.xx) are
>>> >> hardcoded and require running an external script when switching
>>> versions.
>>> >> Sbt could leverage built-in constructs to support cross-compilation
>>> and
>>> >> emulate profiles with configurations and new build targets. This would
>>> >> remove external state from the build (in that no extra steps need to
>>> be
>>> >> performed in a particular order to generate artifacts for a new
>>> >> configuration) and therefore improve stability and build
>>> reproducibility
>>> >> (maybe even build performance). I was wondering if implementing such
>>> >> functionality for the sbt build would be welcome?
>>> >>
>>> >> thanks,
>>> >> --Jakob
>>>
>>> --

Re: State of the Build

2015-11-06 Thread Jakob Odersky
Reposting to the list...

Thanks for all the feedback everyone, I get a clearer picture of the
reasoning and implications now.

Koert, according to your post in this thread
http://apache-spark-developers-list.1001551.n3.nabble.com/Master-build-fails-tt14895.html#a15023,
it is apparently very easy to change the maven resolution mechanism to the
ivy one.
Patrick, would this not help with the problems you described?

On 5 November 2015 at 23:23, Patrick Wendell  wrote:

> Hey Jakob,
>
> The builds in Spark are largely maintained by me, Sean, and Michael
> Armbrust (for SBT). For historical reasons, Spark supports both a Maven and
> SBT build. Maven is the build of reference for packaging Spark and is used
> by many downstream packagers and to build all Spark releases. SBT is more
> often used by developers. Both builds inherit from the same pom files (and
> rely on the same profiles) to minimize maintenance complexity of Spark's
> very complex dependency graph.
>
> If you are looking to make contributions that help with the build, I am
> happy to point you towards some things that are consistent maintenance
> headaches. There are two major pain points right now that I'd be thrilled
> to see fixes for:
>
> 1. SBT relies on a different dependency conflict resolution strategy than
> maven - causing all kinds of headaches for us. I have heard that newer
> versions of SBT can (maybe?) use Maven as a dependency resolver instead of
> Ivy. This would make our life so much better if it were possible, either by
> virtue of upgrading SBT or somehow doing this ourselves.
>
> 2. We don't have a great way of auditing the net effect of dependency
> changes when people make them in the build. I am working on a fairly clunky
> patch to do this here:
>
> https://github.com/apache/spark/pull/8531
>
> It could be done much more nicely using SBT, but only provided (1) is
> solved.
>
> Doing a major overhaul of the sbt build to decouple it from pom files, I'm
> not sure that's the best place to start, given that we need to continue to
> support maven - the coupling is intentional. But getting involved in the
> build in general would be completely welcome.
>
> - Patrick
>
> On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen  wrote:
>
>> Maven isn't 'legacy', or supported for the benefit of third parties.
>> SBT had some behaviors / problems that Maven didn't relative to what
>> Spark needs. SBT is a development-time alternative only, and partly
>> generated from the Maven build.
>>
>> On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers  wrote:
>> > People who do upstream builds of spark (think bigtop and hadoop
>> distros) are
>> > used to legacy systems like maven, so maven is the default build. I
>> don't
>> > think it will change.
>> >
>> > Any improvements for the sbt build are of course welcome (it is still
>> used
>> > by many developers), but i would not do anything that increases the
>> burden
>> > of maintaining two build systems.
>> >
>> > On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:
>> >>
>> >> Hi everyone,
>> >> in the process of learning Spark, I wanted to get an overview of the
>> >> interaction between all of its sub-projects. I therefore decided to
>> have a
>> >> look at the build setup and its dependency management.
>> >> Since I am alot more comfortable using sbt than maven, I decided to
>> try to
>> >> port the maven configuration to sbt (with the help of automated tools).
>> >> This led me to a couple of observations and questions on the build
>> system
>> >> design:
>> >>
>> >> First, currently, there are two build systems, maven and sbt. Is there
>> a
>> >> preferred tool (or future direction to one)?
>> >>
>> >> Second, the sbt build also uses maven "profiles" requiring the use of
>> >> specific commandline parameters when starting sbt. Furthermore, since
>> it
>> >> relies on maven poms, dependencies to the scala binary version (_2.xx)
>> are
>> >> hardcoded and require running an external script when switching
>> versions.
>> >> Sbt could leverage built-in constructs to support cross-compilation and
>> >> emulate profiles with configurations and new build targets. This would
>> >> remove external state from the build (in that no extra steps need to be
>> >> performed in a particular order to generate artifacts for a new
>> >> configuration) and therefore improve stability and build
>> reproducibility
>> >> (maybe even build performance). I was wondering if implementing such
>> >> functionality for the sbt build would be welcome?
>> >>
>> >> thanks,
>> >> --Jakob
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


Re: State of the Build

2015-11-05 Thread Patrick Wendell
Hey Jakob,

The builds in Spark are largely maintained by me, Sean, and Michael
Armbrust (for SBT). For historical reasons, Spark supports both a Maven and
SBT build. Maven is the build of reference for packaging Spark and is used
by many downstream packagers and to build all Spark releases. SBT is more
often used by developers. Both builds inherit from the same pom files (and
rely on the same profiles) to minimize maintenance complexity of Spark's
very complex dependency graph.

If you are looking to make contributions that help with the build, I am
happy to point you towards some things that are consistent maintenance
headaches. There are two major pain points right now that I'd be thrilled
to see fixes for:

1. SBT relies on a different dependency conflict resolution strategy than
maven - causing all kinds of headaches for us. I have heard that newer
versions of SBT can (maybe?) use Maven as a dependency resolver instead of
Ivy. This would make our life so much better if it were possible, either by
virtue of upgrading SBT or somehow doing this ourselves.

2. We don't have a great way of auditing the net effect of dependency
changes when people make them in the build. I am working on a fairly clunky
patch to do this here:

https://github.com/apache/spark/pull/8531

It could be done much more nicely using SBT, but only provided (1) is
solved.

Doing a major overhaul of the sbt build to decouple it from pom files, I'm
not sure that's the best place to start, given that we need to continue to
support maven - the coupling is intentional. But getting involved in the
build in general would be completely welcome.

- Patrick

On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen  wrote:

> Maven isn't 'legacy', or supported for the benefit of third parties.
> SBT had some behaviors / problems that Maven didn't relative to what
> Spark needs. SBT is a development-time alternative only, and partly
> generated from the Maven build.
>
> On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers  wrote:
> > People who do upstream builds of spark (think bigtop and hadoop distros)
> are
> > used to legacy systems like maven, so maven is the default build. I don't
> > think it will change.
> >
> > Any improvements for the sbt build are of course welcome (it is still
> used
> > by many developers), but i would not do anything that increases the
> burden
> > of maintaining two build systems.
> >
> > On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:
> >>
> >> Hi everyone,
> >> in the process of learning Spark, I wanted to get an overview of the
> >> interaction between all of its sub-projects. I therefore decided to
> have a
> >> look at the build setup and its dependency management.
> >> Since I am alot more comfortable using sbt than maven, I decided to try
> to
> >> port the maven configuration to sbt (with the help of automated tools).
> >> This led me to a couple of observations and questions on the build
> system
> >> design:
> >>
> >> First, currently, there are two build systems, maven and sbt. Is there a
> >> preferred tool (or future direction to one)?
> >>
> >> Second, the sbt build also uses maven "profiles" requiring the use of
> >> specific commandline parameters when starting sbt. Furthermore, since it
> >> relies on maven poms, dependencies to the scala binary version (_2.xx)
> are
> >> hardcoded and require running an external script when switching
> versions.
> >> Sbt could leverage built-in constructs to support cross-compilation and
> >> emulate profiles with configurations and new build targets. This would
> >> remove external state from the build (in that no extra steps need to be
> >> performed in a particular order to generate artifacts for a new
> >> configuration) and therefore improve stability and build reproducibility
> >> (maybe even build performance). I was wondering if implementing such
> >> functionality for the sbt build would be welcome?
> >>
> >> thanks,
> >> --Jakob
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: State of the Build

2015-11-05 Thread Sean Owen
Maven isn't 'legacy', or supported for the benefit of third parties.
SBT had some behaviors / problems that Maven didn't relative to what
Spark needs. SBT is a development-time alternative only, and partly
generated from the Maven build.

On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers  wrote:
> People who do upstream builds of spark (think bigtop and hadoop distros) are
> used to legacy systems like maven, so maven is the default build. I don't
> think it will change.
>
> Any improvements for the sbt build are of course welcome (it is still used
> by many developers), but i would not do anything that increases the burden
> of maintaining two build systems.
>
> On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:
>>
>> Hi everyone,
>> in the process of learning Spark, I wanted to get an overview of the
>> interaction between all of its sub-projects. I therefore decided to have a
>> look at the build setup and its dependency management.
>> Since I am alot more comfortable using sbt than maven, I decided to try to
>> port the maven configuration to sbt (with the help of automated tools).
>> This led me to a couple of observations and questions on the build system
>> design:
>>
>> First, currently, there are two build systems, maven and sbt. Is there a
>> preferred tool (or future direction to one)?
>>
>> Second, the sbt build also uses maven "profiles" requiring the use of
>> specific commandline parameters when starting sbt. Furthermore, since it
>> relies on maven poms, dependencies to the scala binary version (_2.xx) are
>> hardcoded and require running an external script when switching versions.
>> Sbt could leverage built-in constructs to support cross-compilation and
>> emulate profiles with configurations and new build targets. This would
>> remove external state from the build (in that no extra steps need to be
>> performed in a particular order to generate artifacts for a new
>> configuration) and therefore improve stability and build reproducibility
>> (maybe even build performance). I was wondering if implementing such
>> functionality for the sbt build would be welcome?
>>
>> thanks,
>> --Jakob

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: State of the Build

2015-11-05 Thread Koert Kuipers
People who do upstream builds of spark (think bigtop and hadoop distros)
are used to legacy systems like maven, so maven is the default build. I
don't think it will change.

Any improvements for the sbt build are of course welcome (it is still used
by many developers), but i would not do anything that increases the burden
of maintaining two build systems.
On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:

> Hi everyone,
> in the process of learning Spark, I wanted to get an overview of the
> interaction between all of its sub-projects. I therefore decided to have a
> look at the build setup and its dependency management.
> Since I am alot more comfortable using sbt than maven, I decided to try to
> port the maven configuration to sbt (with the help of automated tools).
> This led me to a couple of observations and questions on the build system
> design:
>
> First, currently, there are two build systems, maven and sbt. Is there a
> preferred tool (or future direction to one)?
>
> Second, the sbt build also uses maven "profiles" requiring the use of
> specific commandline parameters when starting sbt. Furthermore, since it
> relies on maven poms, dependencies to the scala binary version (_2.xx) are
> hardcoded and require running an external script when switching versions.
> Sbt could leverage built-in constructs to support cross-compilation and
> emulate profiles with configurations and new build targets. This would
> remove external state from the build (in that no extra steps need to be
> performed in a particular order to generate artifacts for a new
> configuration) and therefore improve stability and build reproducibility
> (maybe even build performance). I was wondering if implementing such
> functionality for the sbt build would be welcome?
>
> thanks,
> --Jakob
>


Re: State of the Build

2015-11-05 Thread Mark Hamstra
There was a lot of discussion that preceded our arriving at this statement
in the Spark documentation: "Maven is the official build tool recommended
for packaging Spark, and is the build of reference."
https://spark.apache.org/docs/latest/building-spark.html#building-with-sbt

I'm not aware of anything new in the way of SBT tooling or your post,
Jakob, that would lead us to reconsider the choice of Maven over SBT for
the reference build of Spark.  Of course, I'm by no means the sole and
final authority on the matter, but at least I am not seeing anything in
your suggested approach that hasn't already been considered.  You're
welcome to review the prior discussion and try to convince us that we've
made the wrong choice, but I wouldn't expect that to be a quick and easy
process.


On Thu, Nov 5, 2015 at 4:44 PM, Ted Yu  wrote:

> See previous discussion:
> http://search-hadoop.com/m/q3RTtPnPnzwOhBr
>
> FYI
>
> On Thu, Nov 5, 2015 at 4:30 PM, Stephen Boesch  wrote:
>
>> Yes. The current dev/change-scala-version.sh mutates (/pollutes) the
>> build environment by updating the pom.xml in each of the subprojects. If
>> you were able to come up with a structure that avoids that approach it
>> would be an improvement.
>>
>> 2015-11-05 15:38 GMT-08:00 Jakob Odersky :
>>
>>> Hi everyone,
>>> in the process of learning Spark, I wanted to get an overview of the
>>> interaction between all of its sub-projects. I therefore decided to have a
>>> look at the build setup and its dependency management.
>>> Since I am alot more comfortable using sbt than maven, I decided to try
>>> to port the maven configuration to sbt (with the help of automated tools).
>>> This led me to a couple of observations and questions on the build
>>> system design:
>>>
>>> First, currently, there are two build systems, maven and sbt. Is there a
>>> preferred tool (or future direction to one)?
>>>
>>> Second, the sbt build also uses maven "profiles" requiring the use of
>>> specific commandline parameters when starting sbt. Furthermore, since it
>>> relies on maven poms, dependencies to the scala binary version (_2.xx) are
>>> hardcoded and require running an external script when switching versions.
>>> Sbt could leverage built-in constructs to support cross-compilation and
>>> emulate profiles with configurations and new build targets. This would
>>> remove external state from the build (in that no extra steps need to be
>>> performed in a particular order to generate artifacts for a new
>>> configuration) and therefore improve stability and build reproducibility
>>> (maybe even build performance). I was wondering if implementing such
>>> functionality for the sbt build would be welcome?
>>>
>>> thanks,
>>> --Jakob
>>>
>>
>>
>


Re: State of the Build

2015-11-05 Thread Ted Yu
See previous discussion:
http://search-hadoop.com/m/q3RTtPnPnzwOhBr

FYI

On Thu, Nov 5, 2015 at 4:30 PM, Stephen Boesch  wrote:

> Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build
> environment by updating the pom.xml in each of the subprojects. If you were
> able to come up with a structure that avoids that approach it would be an
> improvement.
>
> 2015-11-05 15:38 GMT-08:00 Jakob Odersky :
>
>> Hi everyone,
>> in the process of learning Spark, I wanted to get an overview of the
>> interaction between all of its sub-projects. I therefore decided to have a
>> look at the build setup and its dependency management.
>> Since I am alot more comfortable using sbt than maven, I decided to try
>> to port the maven configuration to sbt (with the help of automated tools).
>> This led me to a couple of observations and questions on the build system
>> design:
>>
>> First, currently, there are two build systems, maven and sbt. Is there a
>> preferred tool (or future direction to one)?
>>
>> Second, the sbt build also uses maven "profiles" requiring the use of
>> specific commandline parameters when starting sbt. Furthermore, since it
>> relies on maven poms, dependencies to the scala binary version (_2.xx) are
>> hardcoded and require running an external script when switching versions.
>> Sbt could leverage built-in constructs to support cross-compilation and
>> emulate profiles with configurations and new build targets. This would
>> remove external state from the build (in that no extra steps need to be
>> performed in a particular order to generate artifacts for a new
>> configuration) and therefore improve stability and build reproducibility
>> (maybe even build performance). I was wondering if implementing such
>> functionality for the sbt build would be welcome?
>>
>> thanks,
>> --Jakob
>>
>
>


Re: State of the Build

2015-11-05 Thread Stephen Boesch
Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build
environment by updating the pom.xml in each of the subprojects. If you were
able to come up with a structure that avoids that approach it would be an
improvement.

2015-11-05 15:38 GMT-08:00 Jakob Odersky :

> Hi everyone,
> in the process of learning Spark, I wanted to get an overview of the
> interaction between all of its sub-projects. I therefore decided to have a
> look at the build setup and its dependency management.
> Since I am alot more comfortable using sbt than maven, I decided to try to
> port the maven configuration to sbt (with the help of automated tools).
> This led me to a couple of observations and questions on the build system
> design:
>
> First, currently, there are two build systems, maven and sbt. Is there a
> preferred tool (or future direction to one)?
>
> Second, the sbt build also uses maven "profiles" requiring the use of
> specific commandline parameters when starting sbt. Furthermore, since it
> relies on maven poms, dependencies to the scala binary version (_2.xx) are
> hardcoded and require running an external script when switching versions.
> Sbt could leverage built-in constructs to support cross-compilation and
> emulate profiles with configurations and new build targets. This would
> remove external state from the build (in that no extra steps need to be
> performed in a particular order to generate artifacts for a new
> configuration) and therefore improve stability and build reproducibility
> (maybe even build performance). I was wondering if implementing such
> functionality for the sbt build would be welcome?
>
> thanks,
> --Jakob
>