Re: Proposal for Spark Release Strategy

2014-02-05 Thread Heiko Braun
If we could minimize the external dependencies, it would certainly be beneficial long term. > Am 06.02.2014 um 07:37 schrieb Mridul Muralidharan : > > > b) minimize external dependencies - some of them would go away/not be > actively maintained.

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Patrick Wendell
If people feel that merging the intermediate SNAPSHOT number is significant, let's just defer merging that until this discussion concludes. That said - the decision to settle on 1.0 for the next release is not just because it happens to come after 0.9. It's a conscientious decision based on the de

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Mridul Muralidharan
Before we move to 1.0, we need to address two things : a) backward compatibility not just at api level, but also at binary level (not forcing recompile). b) minimize external dependencies - some of them would go away/not be actively maintained. Regards, Mridul On Thu, Feb 6, 2014 at 11:50 AM,

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Andy Konwinski
+1 for 0.10.0 now with the option to switch to 1.0.0 after further discussion. On Feb 5, 2014 9:53 PM, "Andrew Ash" wrote: > Agree on timeboxed releases as well. > > Is there a vision for where we want to be as a project before declaring the > first 1.0 release? While we're in the 0.x days per s

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Andrew Ash
Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't w

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Heiko Braun
+1 on time boxed releases and compatibility guidelines > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : > > Hi Everyone, > > In an effort to coordinate development amongst the growing list of > Spark contributors, I've taken some time to write up a proposal to > formalize various pieces of th

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Heiko Braun
I would even take it further, when it comes to PR's: - any pr needs to reference a jira - the pr should be rebased before submitting, to avoid merge commits - as patrick said: require squashed commits /heiko > Am 06.02.2014 um 01:39 schrieb Mark Hamstra : > > I would strongly encourage that

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Mark Hamstra
Yup, the intended merge level is just a hint, the responsibility still lies with the committers. It can be a helpful hint, though. On Wed, Feb 5, 2014 at 4:55 PM, Patrick Wendell wrote: > > How are Alpha components and higher level libraries which may add small > > features within a maintenanc

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Patrick Wendell
> How are Alpha components and higher level libraries which may add small > features within a maintenance release going to be marked with that status? > Somehow/somewhere within the code itself, as just as some kind of external > reference? I think we'd mark alpha features as such in the java/sca

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Mark Hamstra
Looks good. One question and one comment: How are Alpha components and higher level libraries which may add small features within a maintenance release going to be marked with that status? Somehow/somewhere within the code itself, as just as some kind of external reference? I would strongly enc

Re: Source code JavaNetworkWordcount

2014-02-05 Thread Tathagata Das
Yes. You should be able to. Lets try to have future conversations through the u...@spark.incubator.apache.org mailing list :) On Wed, Feb 5, 2014 at 2:33 PM, Eduardo Costa Alfaia wrote: > So I could use reduceByKeyAndWindow like this > val wordCounts = words.map(x => (x, 1)).reduceByKeyAndWind

Proposal for Spark Release Strategy

2014-02-05 Thread Patrick Wendell
Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coord

Re: Discussion on strategy or roadmap should happen on dev@ list

2014-02-05 Thread Matei Zaharia
Hey Henry, this makes sense. I’d like to add that one other vehicle for discussion has been JIRA at https://spark-project.atlassian.net/browse/SPARK. Right now the dev list is not subscribed to JIRA, but we’d be happy to subscribe it anytime if that helps. We were hoping to do this only when JIR

Discussion on strategy or roadmap should happen on dev@ list

2014-02-05 Thread Henry Saputra
Hi Guys, Just friendly reminder, some of you guys may work closely or collaborate outside the dev@ list and sometimes it is easier. But, as part of Apache Software Foundation project, any decision or outcome that could or will be implemented in the Apache Spark need to happen in the dev@ list as w

Re: Source code JavaNetworkWordcount

2014-02-05 Thread Eduardo Costa Alfaia
So I could use reduceByKeyAndWindow like this val wordCounts = words.map(x => (x, 1)).reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) ? > The reduceByKeyAndWindow and other ***ByKey operations work only on > DStreams of key-value pairs. "Words" is a DStream[String], so its not > key-

Re: Source code JavaNetworkWordcount

2014-02-05 Thread Tathagata Das
The reduceByKeyAndWindow and other ***ByKey operations work only on DStreams of key-value pairs. "Words" is a DStream[String], so its not key-value pairs. "words.map(x => (x, 1))" is DStream[(String, Int)] that has key-value pairs, so you can call reduceByKeyAndWindow. TD On Wed, Feb 5, 20

Re: Source code JavaNetworkWordcount

2014-02-05 Thread Eduardo Costa Alfaia
Hi Tathagata I am playing with NetworkWordCount.scala, I did some changes like this(in red): // Create the context with a 1 second batch size 67 val ssc = new StreamingContext(args(0), "NetworkWordCount", Seconds(1), 68 System.getenv("SPARK_HOME"), StreamingContext.jarOfClass(this.ge

Re: Source code JavaNetworkWordcount

2014-02-05 Thread Tathagata Das
Seems good to me. BTW, its find to MEMORY_ONLY (i.e. without replication) for testing, but you should turn on replication if you want fault-tolerance. TD On Mon, Feb 3, 2014 at 3:19 PM, Eduardo Costa Alfaia wrote: > Hi Tathagata, > > You were right when you have said for me to use scala agains