Re: A proposal for Spark 2.0

2015-12-25 Thread Tao Wang
How about the Hive dependency? We use ThriftServer, serdes and even the parser/execute logic in Hive. Where will we direct about this part? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/A-proposal-for-Spark-2-0-tp15122p15793.html Sent from the Apache

Re: A proposal for Spark 2.0

2015-12-23 Thread Nicholas Chammas
Yeah, I'd also favor maintaining docs with strictly temporary relevance on JIRA when possible. The wiki is like this weird backwater I only rarely visit. Don't we typically do this kind of stuff with an umbrella issue on JIRA? Tom, wouldn't that work well for you? Nick On Wed, Dec 23, 2015 at 5:

Re: A proposal for Spark 2.0

2015-12-23 Thread Sean Owen
I think this will be hard to maintain; we already have JIRA as the de facto central place to store discussions and prioritize work, and the 2.x stuff is already a JIRA. The wiki doesn't really hurt, just probably will never be looked at again. Let's point people in all cases to JIRA. On Tue, Dec 2

Re: A proposal for Spark 2.0

2015-12-22 Thread Reynold Xin
I started a wiki page: https://cwiki.apache.org/confluence/display/SPARK/Development+Discussions On Tue, Dec 22, 2015 at 6:27 AM, Tom Graves wrote: > Do we have a summary of all the discussions and what is planned for 2.0 > then? Perhaps we should put on the wiki for reference. > > Tom > > > O

Re: A proposal for Spark 2.0

2015-12-22 Thread Tom Graves
Do we have a summary of all the discussions and what is planned for 2.0 then?   Perhaps we should put on the wiki for reference. Tom On Tuesday, December 22, 2015 12:12 AM, Reynold Xin wrote: FYI I updated the master branch's Spark version to 2.0.0-SNAPSHOT.  On Tue, Nov 10, 2015 at 3:1

Re: A proposal for Spark 2.0

2015-12-21 Thread Allen Zhang
Thanks your quick respose, ok, I will start a new thread with my thoughts Thanks, Allen At 2015-12-22 15:19:49, "Reynold Xin" wrote: I'm not sure if we need special API support for GPUs. You can already use GPUs on individual executor nodes to build your own applications. If we want to

Re: A proposal for Spark 2.0

2015-12-21 Thread Reynold Xin
I'm not sure if we need special API support for GPUs. You can already use GPUs on individual executor nodes to build your own applications. If we want to leverage GPUs out of the box, I don't think the solution is to provide GPU specific APIs. Rather, we should just switch the underlying execution

Re: A proposal for Spark 2.0

2015-12-21 Thread Allen Zhang
plus dev 在 2015-12-22 15:15:59,"Allen Zhang" 写道: Hi Reynold, Any new API support for GPU computing in our 2.0 new version ? -Allen 在 2015-12-22 14:12:50,"Reynold Xin" 写道: FYI I updated the master branch's Spark version to 2.0.0-SNAPSHOT. On Tue, Nov 10, 2015 at 3:10 PM, Reynol

Re: A proposal for Spark 2.0

2015-12-21 Thread Reynold Xin
FYI I updated the master branch's Spark version to 2.0.0-SNAPSHOT. On Tue, Nov 10, 2015 at 3:10 PM, Reynold Xin wrote: > I’m starting a new thread since the other one got intermixed with feature > requests. Please refrain from making feature request in this thread. Not > that we shouldn’t be add

Re: A proposal for Spark 2.0

2015-12-09 Thread kostas papageorgopoylos
Hi Kostas With regards to your *second* point. I believe that requiring from the user apps to explicitly declare their dependencies is the most clear API approach when it comes to classpath and classloading. However what about the following API: *SparkContext.addJar(String pathToJar)* . *Is this

Re: A proposal for Spark 2.0

2015-12-08 Thread Kostas Sakellis
I'd also like to make it a requirement that Spark 2.0 have a stable dataframe and dataset API - we should not leave these APIs experimental in the 2.0 release. We already know of at least one breaking change we need to make to dataframes, now's the time to make any other changes we need to stabiliz

Re: A proposal for Spark 2.0

2015-12-04 Thread Sean Owen
To be clear-er, I don't think it's clear yet whether a 1.7 release should exist or not. I could see both making sense. It's also not really necessary to decide now, well before a 1.6 is even out in the field. Deleting the version lost information, and I would not have done that given my reply. Reyn

Re: A proposal for Spark 2.0

2015-12-03 Thread Mridul Muralidharan
> >>>>>>> > >>>>>>> Kostas > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Nov 13, 2015 at 12:26 PM, Mark Hamstra > >>>>>>> wrote: > >>>>>>>> > >>>&

Re: A proposal for Spark 2.0

2015-12-03 Thread Rad Gruchalski
>> APIs in > > > >>>>>>> z releases. The Dataset API is experimental and so we might be > > > >>>>>>> changing the > > > >>>>>>> APIs before we declare it stable. This is why I think it is > &

Re: A proposal for Spark 2.0

2015-12-03 Thread Koert Kuipers
>> >>>>>>> APIs but can't move to Spark 2.0 because of the backwards >> incompatible >> >>>>>>> changes, like removal of deprecated APIs, Scala 2.11 etc. >> >>>>>>> >> >>>>>>> Kostas >

Re: A proposal for Spark 2.0

2015-12-03 Thread Mark Hamstra
f deprecated APIs, Scala 2.11 etc. > >>>>>>> > >>>>>>> Kostas > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Nov 13, 2015 at 12:26 PM, Mark Hamstra > >>>>>>> wrote: >

Re: A proposal for Spark 2.0

2015-12-03 Thread Sean Owen
Pardon for tacking on one more message to this thread, but I'm reminded of one more issue when building the RC today: Scala 2.10 does not in general try to work with Java 8, and indeed I can never fully compile it with Java 8 on Ubuntu or OS X, due to scalac assertion errors. 2.11 is the first that

Re: A proposal for Spark 2.0

2015-12-03 Thread Sean Owen
Kostas Sakellis >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> We have veered off the topic of Spark 2.0 a little bit here - yes we >>>>>>>>> can talk about RDD vs. DS/DF more but lets refocus on Spark 2.0. I

Re: A proposal for Spark 2.0

2015-11-26 Thread Reynold Xin
;>>>>> normal/default/standard way to do things in Spark. RDDs, in contrast, >>>>>>>>> would be presented later as a kind of lower-level, >>>>>>>>> closer-to-the-metal API >>>>>>>>> that can be used in atypical, mor

Re: A proposal for Spark 2.0

2015-11-26 Thread Koert Kuipers
> complete control of partitioning which is a key consideration at scale. >>>>>>>> While partitioning control is still piecemeal in DF/DS it would seem >>>>>>>> premature to make RDD's a second-tier approach to spark dev. >>>>

Re: A proposal for Spark 2.0

2015-11-26 Thread Steve Loughran
> On 25 Nov 2015, at 08:54, Sandy Ryza wrote: > > I see. My concern is / was that cluster operators will be reluctant to > upgrade to 2.0, meaning that developers using those clusters need to stay on > 1.x, and, if they want to move to DataFrames, essentially need to port their > app twice.

Re: A proposal for Spark 2.0

2015-11-26 Thread Sean Owen
..., >>>>>>>>> you may want to use the low-level RDD API while setting >>>>>>>>> preservesPartitioning to true. Like this" >>>>>>>>> >>>>>>>>> >>>>

Re: A proposal for Spark 2.0

2015-11-25 Thread Reynold Xin
gt;>>>>>> I understand our goal for Spark 2.0 is to offer an easy transition >>>>>>> but there will be users that won't be able to seamlessly upgrade given >>>>>>> what >>>>>>> we have discussed as in scope for 2.0. For thes

Re: A proposal for Spark 2.0

2015-11-25 Thread Sandy Ryza
t; we have discussed as in scope for 2.0. For these users, having a 1.x >>>>>> release with these new features/APIs stabilized will be very beneficial. >>>>>> This might make Spark 1.7 a lighter release but that is not necessarily a >>>>>> bad thi

Re: A proposal for Spark 2.0

2015-11-24 Thread Matei Zaharia
in DF/DS. > > > > I mean, we need to think about what kind of RDD APIs we have to provide to > developer, maybe the fundamental API is enough, like, the ShuffledRDD etc.. > But PairRDDFunctions probably not in this category, as we can do the same > thing easily with DF/DS, ev

Re: A proposal for Spark 2.0

2015-11-24 Thread Sandy Ryza
is timeline? >>>>> >>>>> Kostas Sakellis >>>>> >>>>> >>>>> >>>>> On Thu, Nov 12, 2015 at 8:39 PM, Cheng, Hao >>>>> wrote: >>>>> >>>>>> Agree, more features/apis

Re: A proposal for Spark 2.0

2015-11-23 Thread Reynold Xin
:39 PM, Cheng, Hao >>>> wrote: >>>> >>>>> Agree, more features/apis/optimization need to be added in DF/DS. >>>>> >>>>> >>>>> >>>>> I mean, we need to think about what kind of RDD APIs we have to &

Re: A proposal for Spark 2.0

2015-11-18 Thread Mark Hamstra
aybe the fundamental API is enough, like, the ShuffledRDD >>>> etc.. But PairRDDFunctions probably not in this category, as we can do the >>>> same thing easily with DF/DS, even better performance. >>>> >>>> >>>> >>>> *From:* Mark Hamst

Re: A proposal for Spark 2.0

2015-11-18 Thread Kostas Sakellis
Functions probably not in this category, as we can do the >>> same thing easily with DF/DS, even better performance. >>> >>> >>> >>> *From:* Mark Hamstra [mailto:m...@clearstorydata.com] >>> *Sent:* Friday, November 13, 2015 11:23 AM >>> *

Re: A proposal for Spark 2.0

2015-11-15 Thread Prashant Sharma
Hey Matei, > Regarding Scala 2.12, we should definitely support it eventually, but I > don't think we need to block 2.0 on that because it can be added later too. > Has anyone investigated what it would take to run on there? I imagine we > don't need many code changes, just maybe some REPL stuff.

Re: A proposal for Spark 2.0

2015-11-14 Thread Steve Loughran
Producing new x.0 releases of open source projects is a recurrent problem: too radical a change means the old version gets updated anyway (Python 3) and an incompatible version stops takeup (example, Log4Js dropping support for log4j.properties files), Similarly, any radical new feature does t

Re: A proposal for Spark 2.0

2015-11-13 Thread Mark Hamstra
ance. >> >> >> >> *From:* Mark Hamstra [mailto:m...@clearstorydata.com] >> *Sent:* Friday, November 13, 2015 11:23 AM >> *To:* Stephen Boesch >> >> *Cc:* dev@spark.apache.org >> *Subject:* Re: A proposal for Spark 2.0 >> >> >> >

Re: A proposal for Spark 2.0

2015-11-13 Thread Kostas Sakellis
better performance. > > > > *From:* Mark Hamstra [mailto:m...@clearstorydata.com] > *Sent:* Friday, November 13, 2015 11:23 AM > *To:* Stephen Boesch > > *Cc:* dev@spark.apache.org > *Subject:* Re: A proposal for Spark 2.0 > > > > Hmmm... to me, that seems lik

RE: A proposal for Spark 2.0

2015-11-12 Thread Cheng, Hao
thing easily with DF/DS, even better performance. From: Mark Hamstra [mailto:m...@clearstorydata.com] Sent: Friday, November 13, 2015 11:23 AM To: Stephen Boesch Cc: dev@spark.apache.org Subject: Re: A proposal for Spark 2.0 Hmmm... to me, that seems like precisely the kind of thing that argues for

Re: RE: A proposal for Spark 2.0

2015-11-12 Thread Guoqiang Li
/main/scala/com/github/cloudml/zen/graphx -- Original -- From: "Ulanov, Alexander";; Date: Fri, Nov 13, 2015 01:44 AM To: "Nan Zhu"; "Guoqiang Li"; Cc: "dev@spark.apache.org"; "Reynold Xin"; Subject: RE:

Re: A proposal for Spark 2.0

2015-11-12 Thread Mark Hamstra
PI only?)? As lots of its functionality overlapping with >>> DataFrame or DataSet. >>> >>> >>> >>> Hao >>> >>> >>> >>> *From:* Kostas Sakellis [mailto:kos...@cloudera.com] >>> *Sent:* Friday, November 13,

Re: A proposal for Spark 2.0

2015-11-12 Thread Stephen Boesch
; >> >> Hao >> >> >> >> *From:* Kostas Sakellis [mailto:kos...@cloudera.com] >> *Sent:* Friday, November 13, 2015 5:27 AM >> *To:* Nicholas Chammas >> *Cc:* Ulanov, Alexander; Nan Zhu; wi...@qq.com; dev@spark.apache.org; >> Reynold Xin >&g

Re: A proposal for Spark 2.0

2015-11-12 Thread Mark Hamstra
* Ulanov, Alexander; Nan Zhu; wi...@qq.com; dev@spark.apache.org; > Reynold Xin > > *Subject:* Re: A proposal for Spark 2.0 > > > > I know we want to keep breaking changes to a minimum but I'm hoping that > with Spark 2.0 we can also look at better classpath isolation w

RE: A proposal for Spark 2.0

2015-11-12 Thread Cheng, Hao
DataFrame or DataSet. Hao From: Kostas Sakellis [mailto:kos...@cloudera.com] Sent: Friday, November 13, 2015 5:27 AM To: Nicholas Chammas Cc: Ulanov, Alexander; Nan Zhu; wi...@qq.com; dev@spark.apache.org; Reynold Xin Subject: Re: A proposal for Spark 2.0 I know we want to keep breaking changes to a

Re: A proposal for Spark 2.0

2015-11-12 Thread Kostas Sakellis
low GraphX evolve with Tungsten. >> >> >> >> Best regards, Alexander >> >> >> >> *From:* Nan Zhu [mailto:zhunanmcg...@gmail.com] >> *Sent:* Thursday, November 12, 2015 7:28 AM >> *To:* wi...@qq.com >> *Cc:* dev@spark.apache.org >&

Re: A proposal for Spark 2.0

2015-11-12 Thread Nicholas Chammas
at to deprecate the use of RDD in > GraphX and switch to Dataframe. This will allow GraphX evolve with Tungsten. > > > > Best regards, Alexander > > > > *From:* Nan Zhu [mailto:zhunanmcg...@gmail.com] > *Sent:* Thursday, November 12, 2015 7:28 AM > *To:* wi...@qq.com >

RE: A proposal for Spark 2.0

2015-11-12 Thread Ulanov, Alexander
regards, Alexander From: Nan Zhu [mailto:zhunanmcg...@gmail.com] Sent: Thursday, November 12, 2015 7:28 AM To: wi...@qq.com Cc: dev@spark.apache.org Subject: Re: A proposal for Spark 2.0 Being specific to Parameter Server, I think the current agreement is that PS shall exist as a third-party library

Re: A proposal for Spark 2.0

2015-11-12 Thread Nan Zhu
Being specific to Parameter Server, I think the current agreement is that PS shall exist as a third-party library instead of a component of the core code base, isn’t? Best, -- Nan Zhu http://codingcat.me On Thursday, November 12, 2015 at 9:49 AM, wi...@qq.com wrote: > Who has the idea of

Re: A proposal for Spark 2.0

2015-11-12 Thread witgo
Who has the idea of machine learning? Spark missing some features for machine learning, For example, the parameter server. > 在 2015年11月12日,05:32,Matei Zaharia 写道: > > I like the idea of popping out Tachyon to an optional component too to reduce > the number of dependencies. In the future, it

Re: A proposal for Spark 2.0

2015-11-11 Thread Matei Zaharia
I like the idea of popping out Tachyon to an optional component too to reduce the number of dependencies. In the future, it might even be useful to do this for Hadoop, but it requires too many API changes to be worth doing now. Regarding Scala 2.12, we should definitely support it eventually, bu

Re: A proposal for Spark 2.0

2015-11-11 Thread hitoshi
Resending my earlier message because it wasn't accepted. Would like to add a proposal to upgrade jars when they do not break APIs and fixes a bug. To be more specific, I would like to see Kryo to be upgraded from 2.21 to 3.x. Kryo 2.x has a bug (e.g.SPARK-7708) that is blocking it usage in produc

Re: A proposal for Spark 2.0

2015-11-11 Thread hitoshi
It looks like Chill is willing to upgrade their Kryo to 3.x if Spark and Hive will. As it is now Spark, Chill, and Hive have Kryo jar but it really can't be used because Kryo 2 can't serdes some classes. Since Spark 2.0 is a major release, it really would be nice if we can resolve the Kryo issue.

Re: A proposal for Spark 2.0

2015-11-11 Thread Jonathan Kelly
If Scala 2.12 will require Java 8 and we want to enable cross-compiling Spark against Scala 2.11 and 2.12, couldn't we just make Java 8 a requirement if you want to use Scala 2.12? On Wed, Nov 11, 2015 at 9:29 AM, Koert Kuipers wrote: > i would drop scala 2.10, but definitely keep java 7 > > cro

Re: A proposal for Spark 2.0

2015-11-11 Thread Koert Kuipers
good point about dropping <2.2 for hadoop. you dont want to deal with protobuf 2.4 for example On Wed, Nov 11, 2015 at 4:58 AM, Sean Owen wrote: > On Wed, Nov 11, 2015 at 12:10 AM, Reynold Xin wrote: > > to the Spark community. A major release should not be very different > from a > > minor re

Re: A proposal for Spark 2.0

2015-11-11 Thread Koert Kuipers
i would drop scala 2.10, but definitely keep java 7 cross build for scala 2.12 is great, but i dont know how that works with java 8 requirement. dont want to make java 8 mandatory. and probably stating the obvious, but a lot of apis got polluted due to binary compatibility requirement. cleaning t

Re: A proposal for Spark 2.0

2015-11-11 Thread Zoltán Zvara
Hi, Reconsidering the execution model behind Streaming would be a good candidate here, as Spark will not be able to provide the low latency and sophisticated windowing semantics that more and more use-cases will require. Maybe relaxing the strict batch model would help a lot. (Mainly this would hi

Re: A proposal for Spark 2.0

2015-11-11 Thread Tim Preece
Considering Spark 2.x will run for 2 years, would moving up to Scala 2.12 ( pencilled in for Jan 2016 ) make any sense ? - although that would then pre-req Java 8. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/A-proposal-for-Spark-2-0-tp15122p15153.h

Re: A proposal for Spark 2.0

2015-11-11 Thread Sean Owen
On Wed, Nov 11, 2015 at 12:10 AM, Reynold Xin wrote: > to the Spark community. A major release should not be very different from a > minor release and should not be gated based on new features. The main > purpose of a major release is an opportunity to fix things that are broken > in the current A

Re: A proposal for Spark 2.0

2015-11-10 Thread Jean-Baptiste Onofré
Hi, I fully agree that. Actually, I'm working on PR to add "client" and "exploded" profiles in Maven build. The client profile create a spark-client-assembly jar, largely more lightweight that the spark-assembly. In our case, we construct jobs that don't require all the spark server side. It

Re: A proposal for Spark 2.0

2015-11-10 Thread Jean-Baptiste Onofré
Agree, it makes sense. Regards JB On 11/11/2015 01:28 AM, Reynold Xin wrote: Echoing Shivaram here. I don't think it makes a lot of sense to add more features to the 1.x line. We should still do critical bug fixes though. On Tue, Nov 10, 2015 at 4:23 PM, Shivaram Venkataraman mailto:shiva...@

Re: A proposal for Spark 2.0

2015-11-10 Thread Mark Hamstra
To take a stab at an example of something concrete and anticipatory I can go back to something I mentioned previously. It's not really a good example because I don't mean to imply that I believe that its premises are true, but try to go with it If we were to decide that real-time, event-based

Re: A proposal for Spark 2.0

2015-11-10 Thread Mark Hamstra
Heh... ok, I was intentionally pushing those bullet points to be extreme to find where people would start pushing back, and I'll agree that we do probably want some new features in 2.0 -- but I think we've got good agreement that new features aren't really the main point of doing a 2.0 release. I

Re: A proposal for Spark 2.0

2015-11-10 Thread Marcelo Vanzin
On Tue, Nov 10, 2015 at 6:51 PM, Reynold Xin wrote: > I think we are in agreement, although I wouldn't go to the extreme and say > "a release with no new features might even be best." > > Can you elaborate "anticipatory changes"? A concrete example or so would be > helpful. I don't know if that's

Re: A proposal for Spark 2.0

2015-11-10 Thread Reynold Xin
Mark, I think we are in agreement, although I wouldn't go to the extreme and say "a release with no new features might even be best." Can you elaborate "anticipatory changes"? A concrete example or so would be helpful. On Tue, Nov 10, 2015 at 5:19 PM, Mark Hamstra wrote: > I'm liking the way t

Re: A proposal for Spark 2.0

2015-11-10 Thread Sudhir Menon
Agree. If it is deprecated, get rid of it in 2.0 If the deprecation was a mistake, let's fix that. Suds Sent from my iPhone On Nov 10, 2015, at 5:04 PM, Reynold Xin wrote: Maybe a better idea is to un-deprecate an API if it is too important to not be removed. I don't think we can drop Java 7 s

Re: A proposal for Spark 2.0

2015-11-10 Thread Mark Hamstra
I'm liking the way this is shaping up, and I'd summarize it this way (let me know if I'm misunderstanding or misrepresenting anything): - New features are not at all the focus of Spark 2.0 -- in fact, a release with no new features might even be best. - Remove deprecated API that we agree

Re: A proposal for Spark 2.0

2015-11-10 Thread Reynold Xin
Maybe a better idea is to un-deprecate an API if it is too important to not be removed. I don't think we can drop Java 7 support. It's way too soon. On Tue, Nov 10, 2015 at 4:59 PM, Mark Hamstra wrote: > Really, Sandy? "Extra consideration" even for already-deprecated API? If > we're not go

Re: A proposal for Spark 2.0

2015-11-10 Thread Mark Hamstra
Really, Sandy? "Extra consideration" even for already-deprecated API? If we're not going to remove these with a major version change, then just when will we remove them? On Tue, Nov 10, 2015 at 4:53 PM, Sandy Ryza wrote: > Another +1 to Reynold's proposal. > > Maybe this is obvious, but I'd li

Re: A proposal for Spark 2.0

2015-11-10 Thread Sandy Ryza
Oh and another question - should Spark 2.0 support Java 7? On Tue, Nov 10, 2015 at 4:53 PM, Sandy Ryza wrote: > Another +1 to Reynold's proposal. > > Maybe this is obvious, but I'd like to advocate against a blanket removal > of deprecated / developer APIs. Many APIs can likely be removed witho

Re: A proposal for Spark 2.0

2015-11-10 Thread Sandy Ryza
Another +1 to Reynold's proposal. Maybe this is obvious, but I'd like to advocate against a blanket removal of deprecated / developer APIs. Many APIs can likely be removed without material impact (e.g. the SparkContext constructor that takes preferred node location data), while others likely see

Re: A proposal for Spark 2.0

2015-11-10 Thread Reynold Xin
Echoing Shivaram here. I don't think it makes a lot of sense to add more features to the 1.x line. We should still do critical bug fixes though. On Tue, Nov 10, 2015 at 4:23 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > +1 > > On a related note I think making it lightweight wi

Re: A proposal for Spark 2.0

2015-11-10 Thread Shivaram Venkataraman
+1 On a related note I think making it lightweight will ensure that we stay on the current release schedule and don't unnecessarily delay 2.0 to wait for new features / big architectural changes. In terms of fixes to 1.x, I think our current policy of back-porting fixes to older releases would st

Re: A proposal for Spark 2.0

2015-11-10 Thread Mridul Muralidharan
Would be also good to fix api breakages introduced as part of 1.0 (where there is missing functionality now), overhaul & remove all deprecated config/features/combinations, api changes that we need to make to public api which has been deferred for minor releases. Regards, Mridul On Tue, Nov 10, 2

Re: A proposal for Spark 2.0

2015-11-10 Thread Kostas Sakellis
+1 on a lightweight 2.0 What is the thinking around the 1.x line after Spark 2.0 is released? If not terminated, how will we determine what goes into each major version line? Will 1.x only be for stability fixes? Thanks, Kostas On Tue, Nov 10, 2015 at 3:41 PM, Patrick Wendell wrote: > I also f

Re: A proposal for Spark 2.0

2015-11-10 Thread Josh Rosen
There's a proposal / discussion of the assembly-less distributions at https://github.com/vanzin/spark/pull/2/files / https://issues.apache.org/jira/browse/SPARK-11157. On Tue, Nov 10, 2015 at 3:53 PM, Reynold Xin wrote: > > On Tue, Nov 10, 2015 at 3:35 PM, Nicholas Chammas < > nicholas.cham...@g

Re: A proposal for Spark 2.0

2015-11-10 Thread Reynold Xin
On Tue, Nov 10, 2015 at 3:35 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > > > 3. Assembly-free distribution of Spark: don’t require building an > enormous assembly jar in order to run Spark. > > Could you elaborate a bit on this? I'm not sure what an assembly-free > distribution mea

Re: A proposal for Spark 2.0

2015-11-10 Thread Patrick Wendell
I also feel the same as Reynold. I agree we should minimize API breaks and focus on fixing things around the edge that were mistakes (e.g. exposing Guava and Akka) rather than any overhaul that could fragment the community. Ideally a major release is a lightweight process we can do every couple of

Re: A proposal for Spark 2.0

2015-11-10 Thread Nicholas Chammas
> For this reason, I would *not* propose doing major releases to break substantial API's or perform large re-architecting that prevent users from upgrading. Spark has always had a culture of evolving architecture incrementally and making changes - and I don't think we want to change this model. +1

A proposal for Spark 2.0

2015-11-10 Thread Reynold Xin
I’m starting a new thread since the other one got intermixed with feature requests. Please refrain from making feature request in this thread. Not that we shouldn’t be adding features, but we can always add features in 1.7, 2.1, 2.2, ... First - I want to propose a premise for how to think about S