Stage vs. StageInfo

2013-07-23 Thread Mark Hamstra
So I'm currently working in Spark's DAGScheduler and related UI code, and I'm finding myself wondering why there are StageInfos distinct from Stages. It seems like we go through some bookkeeping to make sure that we can get from a Stage to a StageInfo, which in turn is just a pairing of the Stage

Re: Stage vs. StageInfo

2013-07-23 Thread Mark Hamstra
l the SparkListener events yet but the goal is to do so. > > Matei > > On Jul 23, 2013, at 4:22 PM, Mark Hamstra wrote: > > > So I'm currently working in Spark's DAGScheduler and related UI code, and > > I'm finding myself wondering why there are StageInfos

Re: Keeping users mailing list with Google groups

2013-08-20 Thread Mark Hamstra
Is Google groups email formatting completely compatible with Apache lists? On Tue, Aug 20, 2013 at 11:43 AM, Henry Saputra wrote: > We'll let the other mentors chime in and give their thoughts. > I have been looking at the Apache policy about public listing [1] and could > not find anything abo

RDDs with no partitions

2013-08-22 Thread Mark Hamstra
So how do these get created, and are we really handling them correctly? What is prompting my questions is that I'm looking at making sure that the various data structures in the DAGScheduler shrink when appropriate instead of growing without bounds. Jobs with no partitions and the "zero split job

Re: RDDs with no partitions

2013-08-22 Thread Mark Hamstra
tions real things that we actually have to deal with? On Thu, Aug 22, 2013 at 9:20 PM, Reynold Xin wrote: > Being the guy that added the empty partition rdd, I second your idea that > we should just short-circuit those in DAGScheduler.runJob. > > > > > On Thu, Aug 22, 2013 at

Re: off-heap RDDs

2013-08-25 Thread Mark Hamstra
I'd need to see a clear and significant advantage to using off-heap RDDs directly within Spark vs. leveraging Tachyon. What worries me is the combinatoric explosion of different caching and persistence mechanisms. With too many of these, not only will users potentially be baffled (@user-list: "Wh

Re: off-heap RDDs

2013-08-25 Thread Mark Hamstra
ate storage level. > One simple way to accomplish this is for the user application to pass Spark > a DirectByteBuffer. > > > > > On Sun, Aug 25, 2013 at 6:06 PM, Mark Hamstra >wrote: > > > I'd need to see a clear and significant advantage to using off-heap RD

Re: Release process?

2013-08-30 Thread Mark Hamstra
Doc PRs were still open 4 hours ago. At least for some of us. On Fri, Aug 30, 2013 at 3:37 PM, Evan Chan wrote: > Hey guys, > > What is the schedule for the 0.8 release? > > In general, will the dev community be notified of code freeze, testing > deadli

Re: [Licensing check] Spark 0.8.0-incubating RC1

2013-09-03 Thread Mark Hamstra
So are you planning to release 0.8 from the master branch (which is at a106ed8... now) or from branch-0.8? On Mon, Sep 2, 2013 at 6:49 PM, Matei Zaharia wrote: > By the way, this tar file also corresponds to commit > a106ed8b97e707b36818c11d1d7211fa28636178 in our Apache git repo. > > Matei > >

Re: Upgrading to latest Spray Milestone

2013-09-03 Thread Mark Hamstra
Spark doesn't use Spray anymore. On Tue, Sep 3, 2013 at 11:04 AM, Gary Malouf wrote: > The spray team did some major refactors between M2 and M8. Has anyone > tried to upgrade spark for this? I am looking at working on it this week, > but wanted to see if anyone had already taken it on. > > G

Re: [Licensing check] Spark 0.8.0-incubating RC1

2013-09-03 Thread Mark Hamstra
What is going to be the process for making pull requests? Can they be made against the github mirror (https://github.com/apache/incubator-spark), or must we use some other way? On Tue, Sep 3, 2013 at 10:28 AM, Matei Zaharia wrote: > Hi guys, > > > So are you planning to release 0.8 from the mas

Re: [Licensing check] Spark 0.8.0-incubating RC1

2013-09-03 Thread Mark Hamstra
ject that did this: > http://wiki.apache.org/cordova/ContributorWorkflow. > > Matei > > On Sep 3, 2013, at 10:39 AM, Mark Hamstra wrote: > > > What is going to be the process for making pull requests? Can they be > made > > against the github mirror (ht

Re: Spark 0.8.0-incubating RC2

2013-09-05 Thread Mark Hamstra
Are these RCs not getting tagged in the repository, or am I just not looking in the right place? On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell wrote: > Hey All, > > Matei asked me to pick this up because he's travelling this week. I > cut a second release candidate from the head of the 0.8 b

Port 3030 conflict

2013-09-10 Thread Mark Hamstra
I should have tracked it down earlier when I started seeing persistent UISuite failures, but I finally got around to looking at it today. We've got an annoying port conflict problem. We're setting SparkUI.DEFAULT_PORT to 3030, which is the same port that Typesafe has chosen as the default Nailgun

Re: Port 3030 conflict

2013-09-10 Thread Mark Hamstra
On Tue, Sep 10, 2013 at 04:32PM, Mark Hamstra wrote: > > I should have tracked it down earlier when I started seeing persistent > > UISuite failures, but I finally got around to looking at it today. We've > > got an annoying port conflict problem. We're setting >

Maven SCM trouble

2013-09-11 Thread Mark Hamstra
I'm in the process of trying to build Apache-fied Spark into our stack at ClearStory. We've been using Maven to do Debian packaging as found in the repl-bin module for quite a while, but that doesn't work now. To see the failure, you need to build repl-bin with the `deb` profile -- e.g. "mvn -Pre

Re: Maven SCM trouble

2013-09-11 Thread Mark Hamstra
saying the pom.xml from parent Apache pom.xml overrides the > Spark definition instead? > > > > > On Wed, Sep 11, 2013 at 3:04 PM, Mark Hamstra > wrote: > > I'm in the process of trying to build Apache-fied Spark into our stack at > > ClearStory. We'

Re: Maven SCM trouble

2013-09-11 Thread Mark Hamstra
lease-plugin -- did your prepare & release use -Phadoop2-yar,repl-bin , Patrick? On Wed, Sep 11, 2013 at 10:31 PM, Mark Hamstra wrote: > Yes, exactly. If I comment out the reference to the parent Apache pom, > then the buildnumber plugin works correctly. Similarly if I leave the >

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC3)

2013-09-13 Thread Mark Hamstra
re community know that there are missing pieces in the > release artifacts proposed by Spark's RE (Patrick) > > Thanks, > > Henry > > > On Thu, Sep 12, 2013 at 10:57 PM, Mark Hamstra > wrote: > > Yeah, that may get tricky, because the check of the tests in the

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC3)

2013-09-13 Thread Mark Hamstra
n a bit which addresses Mark's comments (though > please continue to provide feedback on this one!). > > Suresh - it's signed with the following key: > > http://people.apache.org/~pwendell/9E4FE3AF.asc > > > > On Fri, Sep 13, 2013 at 11:28 AM, Mark Hams

Re: git commit: Hard code scala version in pom files.

2013-09-15 Thread Mark Hamstra
Whoa there, cowboy! It's just a warning, and removing parameterized artifactIds (or versions) is a significant change in functionality that necessitates changes in user behavior. At a minimum, this needs to be discussed before we go this route. If we really want to get rid of the warnings right

Re: git commit: Hard code scala version in pom files.

2013-09-15 Thread Mark Hamstra
ter branch if you'd > like. This is just a fix to bring the 0.8 branch into line with our > existing releases (and since 0.8 only supports scala 2.9.3 anyways, > I'm still not sure how this could affect any users adversely). > > - Patrick > > On Sun, Sep 15, 2013 at 5:41 PM

Re: git commit: Hard code scala version in pom files.

2013-09-15 Thread Mark Hamstra
problem in the future. If > someone presents a compelling reason, we'll have to think about > whether we can keep publishing them like this, since this is not > technically a valid maven format. > > - Patrick > > On Sun, Sep 15, 2013 at 6:46 PM, Mark Hamstra > wrot

Re: git commit: Hard code scala version in pom files.

2013-09-15 Thread Mark Hamstra
ing the scala > version in branch 0.8.0 build? It just seems like the overall simplest > solution for now. Or would this cause a large problem for you guys? > > We can solve this on master for 0.9, I didn't touch master at all wrt > the maven build. > > - Patrick > >

Re: git commit: Hard code scala version in pom files.

2013-09-15 Thread Mark Hamstra
we change the release format. > > - Patrick > > On Sun, Sep 15, 2013 at 7:49 PM, Mark Hamstra > wrote: > > As long as its just a simple replacement of parameters, it's not too hard > > to live with in the short term. It's only if we let things fester and I &

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-17 Thread Mark Hamstra
There are a few nits left to pick: 'sbt/sbt publish-local' isn't generating correct POM files because of the way the exclusions are defined in SparkBuild.scala using wildcards; looks like there may be some broken doc links generated in that task, as well; DriverSuite doesn't like to run from the ma

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-18 Thread Mark Hamstra
so I assume we are getting +1 from you? ;) > > On Tue, Sep 17, 2013 at 1:26 PM, Mark Hamstra > wrote: > > There are a few nits left to pick: 'sbt/sbt publish-local' isn't > generating > > correct POM files because of the way the exclusions are defined in >

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-18 Thread Mark Hamstra
"not enough", of course. On Wed, Sep 18, 2013 at 10:38 AM, Mark Hamstra wrote: > If that will make the difference between releasing and not releasing, then > sure; else I'd rather see the last niggles resolved, but that's enough for > me to vote -1 -- so I'm

Re: Experimental Scala-2.10.3 branch based on master

2013-10-04 Thread Mark Hamstra
Yeah, sorry to say, but I think you've largely or completely duplicated work that has already been done. If anything, the state of Prashant's current work is mostly ahead of yours since, among other things, he has already incorporated the changes I made to use ClassTag. On Fri, Oct 4, 2013 at 1

Re: Experimental Scala-2.10.3 branch based on master

2013-10-04 Thread Mark Hamstra
That's the idea. The fact is that there are still some significant chunks of pending development that are still over in the old repo. For at least a little while longer, it's probably a good idea to post your intentions here before embarking on any large project. That way we can point out any po

Re: Want to Contribute to spark

2013-10-04 Thread Mark Hamstra
Whether you are contributing large or small changes, pull requests through github (as outlined previously) are by far the preferred method. On Fri, Oct 4, 2013 at 1:09 PM, hilfi alkaff wrote: > Hi, > > Related to this thread, I would also like to start contributing to > open-source but I curren

Re: Want to Contribute to spark

2013-10-04 Thread Mark Hamstra
I work on a > specific bug in the JIRA tickets (e.g. do I need to send notification to > the mailing list that I am working in this bug, etc)? Sorry if this is an > obvious question. > > > On Fri, Oct 4, 2013 at 3:18 PM, Mark Hamstra >wrote: > > > Whether you are contrib

Re: A quick note about spark-class (the launch script)

2013-10-09 Thread Mark Hamstra
https://github.com/apache/incubator-spark/commit/f3c60c9c0ce1f0c94174ecb903519b5fc4c696cd#diff-96515a7165082eff6dfecf69581c7349 Already fixed for 0.8.1 On Wed, Oct 9, 2013 at 4:36 AM, Markus Losoi wrote: > Hi > > Should the strings "spark.deploy.x.y" be "org.apache.spark.deploy.x.y" > (e.g.,

Re: Test coverage of Spark

2013-10-12 Thread Mark Hamstra
There is also spark-perf . On Sat, Oct 12, 2013 at 2:22 PM, Christopher Nguyen wrote: > Roman, an area I think would (a) have high impact, and (b) is relatively > not well covered is performance analysis. I'm sure most teams are doing > this internally at t

Does ExecutorRunner.buildJavaOpts work the way we want it to?

2013-10-14 Thread Mark Hamstra
I'm busy working on upgrading an application stack of which Spark and Shark are components. The 0.8.0 changes in how configuration, environment variables, and SPARK_JAVA_OPTS are handled are giving me some trouble, but I'm not sure whether it is just my trouble or a more general trouble with Execu

Are we moving too fast or too far on 0.8.1-SNAPSHOT?

2013-10-28 Thread Mark Hamstra
Or more to the point: What is our commitment to backward compatibility in point releases? Many Java developers will come to a library or platform versioned as x.y.z with the expectation that if their own code worked well using x.y.(z-1) as a dependency, then moving up to x.y.z will be painless and

Re: Are we moving too fast or too far on 0.8.1-SNAPSHOT?

2013-10-28 Thread Mark Hamstra
ynold Xin wrote: > Hi Mark, > > I can't comment much on the Spark part right now (because I have to run in > 3 mins), but we will make Shark 0.8.1 work with Spark 0.8.1 for sure. Some > of the changes will get cherry picked into branch-0.8 of Shark. > > > On Mon, Oct

Re: Are we moving too fast or too far on 0.8.1-SNAPSHOT?

2013-10-28 Thread Mark Hamstra
lam wrote: > > > I agree that we should strive to maintain full backward compatibility > > between patch releases (i.e. incrementing the "z" in "version x.y.z"). > > > > On Mon, Oct 28, 2013 at 3:22 PM, Mark Hamstra > wrote: > >> Or more to

Re: Getting failures in FileServerSuite

2013-10-30 Thread Mark Hamstra
What JDK version on you using, Evan? I tried to reproduce your problem earlier today, but I wasn't even able to get through the assembly build -- kept hanging when trying to build the examples assembly. Foregoing the assembly and running the tests would hang on FileServerSuite "Dynamically adding

Re: Getting failures in FileServerSuite

2013-10-30 Thread Mark Hamstra
personally today, but just a suggestion for ppl > > for whom this is blocking progress. > > > > - Patrick > > > > On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra > > wrote: > > > What JDK version on you using, Evan? > > > > > > I tried to

Re: a question about lineage graphs in streaming

2013-11-02 Thread Mark Hamstra
You're coming at the paper from a different context than that in which it was written. The paper doesn't claim that RDD lineage and state could grow indefinitely after the Spark Streaming changes were made. That growth was indefinite in early, pre-Streaming versions of Spark, however. On Sat,

Re: a question about lineage graphs in streaming

2013-11-02 Thread Mark Hamstra
n > code base is not what figure 3 in paper describes, so I have no idea what > application figure 3 is talking about. > > Mark, sorry I don't quite understand what you've said. > > thanks, > dachuan. > > > On Sat, Nov 2, 2013 at 4:35 PM, Mark Hamstra >wrote:

Re: what's the strategy for code sync between branches e.g. scala-2.10 v.s. master?

2013-11-04 Thread Mark Hamstra
Rebasing changes the SHAs, which isn't a good idea in a public and heavily-used repository. On Mon, Nov 4, 2013 at 1:04 AM, Liu, Raymond wrote: > Hi > It seems to me that dev branches are sync with master by keep > merging trunk codes. E.g. scala-2.10 branches continuously merge latest

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc1)

2013-12-07 Thread Mark Hamstra
Not sure. I haven't been able to discern any pattern as to what new code goes into both 0.9 and 0.8 vs. what goes only into 0.8, so I can't really tell whether 0.8.1 is done or if something has been overlooked and not cherry-picked from master. On Sat, Dec 7, 2013 at 3:24 PM, Patrick Wendell wr

Re: [DISCUSS] About the [VOTE] Release Apache Spark 0.8.1-incubating (rc1)

2013-12-08 Thread Mark Hamstra
I'm aware of the changes file, but it really doesn't address the issue that I am raising. The changes file just tells me what has gone into the release candidate. In general, it doesn't tell me why those changes went in or provide any rationale by which to judge whether that is the complete set o

Re: [DISCUSS] About the [VOTE] Release Apache Spark 0.8.1-incubating (rc1)

2013-12-08 Thread Mark Hamstra
ease, run the tests, run the release in your dev environment, read > through the documentation, etc. This is one of the main points of > releasing an RC to the community... even if you disagree with some > patches that were merged in, this is still a way you can help validate > the release.

Re: [DISCUSS] About the [VOTE] Release Apache Spark 0.8.1-incubating (rc1)

2013-12-08 Thread Mark Hamstra
t things like > writing a custom RDD or SparkListener. These may change in major versions, > but at least you’ll be able to expect that maintenance releases in the > original branch don’t break them. > > Matei > > On Dec 8, 2013, at 2:45 PM, Mark Hamstra wrote: > > >

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Mark Hamstra
SPARK-962 should be resolved before release. See also: https://github.com/apache/incubator-spark/pull/195 With the references to the way I changed Debian packaging for ClearStory, we should be at least 90% of the way toward doing it right for Apache. On Sun, Dec 8, 2013 at 5:29 PM, Patrick Wend

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Mark Hamstra
Probably not blockers, but there are still some non-deterministic test failures -- e.g. streaming CheckpointSuite. On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra wrote: > SPARK-962 should be resolved before release. See also: > https://github.com/apache/incubator-spark/pull/195 > &

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Mark Hamstra
gt; Does merging that particular PR put this in sufficient shape for the > > 0.8.1 release or are there other open patches we need to look at? > > > > - Patrick > > > > On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra > wrote: > >> SPARK-962 should be resolved b

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Mark Hamstra
) it works according to that approach. > > We can redesign this packaging in 0.9. That will require having a PR > against Apache Spark, discussing, etc. But it doesn't need to be on > the critical path for this release. > > - Patrick > > On Sun, Dec 8, 2013 at 7:54 PM

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Mark Hamstra
he branch 0.8 let's just > merge that to keep it consistent with our docs and the way this is > done in 0.8.0 > > We can do a broader refactoring in 0.9. Would be great if you could > kick off a JIRA discussion or submit a PR relating to that. > > - Patrick > > On Sun,

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Mark Hamstra
And I think 195 is sufficient to build something that works; but I haven't personally tested it. On Sun, Dec 8, 2013 at 8:41 PM, Mark Hamstra wrote: > Well, what I've already done for ClearStory is very close to how Debian > packaging should be done for Apache Spark. That much

Re: Spark API - support for asynchronous calls - Reactive style [I]

2013-12-09 Thread Mark Hamstra
Spark has already supported async jobs for awhile now -- https://github.com/apache/incubator-spark/pull/29, and they even work correctly after https://github.com/apache/incubator-spark/pull/232 There are now implicit conversions from RDD to AsyncRDDActions

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-11 Thread Mark Hamstra
Interesting, and confirmed: On my machine where `./sbt/sbt assembly` takes a long, long, long time to complete (a MBP, in my case), building three separate assemblies (`./sbt/sbt assembly/assembly`, `./sbt/sbt examples/assembly`, `./sbt/sbt tools/assembly`) takes much, much less time. On Wed

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-11 Thread Mark Hamstra
gt; > > > Matei > > > > On Dec 11, 2013, at 1:04 AM, Prashant Sharma > wrote: > > > >> I hope this PR https://github.com/apache/incubator-spark/pull/252 can > help. > >> Again this is not a blocker for the release from my side either. > >> &

Re: Spark API - support for asynchronous calls - Reactive style [I]

2013-12-12 Thread Mark Hamstra
I'm having fun with it. And it's almost all Reynold's work, so I can't take credit for it. On Thu, Dec 12, 2013 at 8:56 AM, Evan Chan wrote: > Mark, > > Thanks. The FutureAction API looks awesome. > > > On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra >

Option folding idiom

2013-12-26 Thread Mark Hamstra
In code added to Spark over the past several months, I'm glad to see more use of `foreach`, `for`, `map` and `flatMap` over `Option` instead of pattern matching boilerplate. There are opportunities to push `Option` idioms even further now that we are using Scala 2.10 in master, but I want to discu

Re: Option folding idiom

2013-12-26 Thread Mark Hamstra
ot as intuitive. What about the following, which > are more readable? > > option.map { a => someFuncMakesB() } > .getOrElse(b) > > option.map { a => someFuncMakesB() } > .orElse { a => otherDefaultB() }.get > > > On Thu, De

Re: Option folding idiom

2013-12-27 Thread Mark Hamstra
; On Thu, Dec 26, 2013 at 7:58 PM, Reynold Xin > wrote: > >> > >>> I'm not strongly against Option.fold, but I find the readability > getting > >>> worse for the use case you brought up. For the use case of if/else, I > >> find > >>>

Re: how to set up environment to develop Spark on macbook pro retina?

2014-01-05 Thread Mark Hamstra
I don't understand: What are you finding different in developing on OSX vs. Linux that requires tweaking? On Sun, Jan 5, 2014 at 11:12 AM, q q wrote: > Hello, > I have a macbook pro retina, wondering how people with similar model set > up development environment: > 1. if I choose to install a l

Re: how to set up environment to develop Spark on macbook pro retina?

2014-01-05 Thread Mark Hamstra
You don't need gcc to build Spark, and there is very little difference between developing and running Spark and Shark on OSX vs. Linux. In fact, I dare say that most of the Spark committers develop primarily on MBPs and run it at different times on their local OSX machine and on physical or virtua

Re: how to set up environment to develop Spark on macbook pro retina?

2014-01-05 Thread Mark Hamstra
Some do work with Spark as much as possible within IntelliJ or Eclipse (or Sublime), but I'm not one of them. I work closer to your second option, with some Emacs and deployment tooling also thrown into the mix. In short, you need to find someone else to answer questions about how to most fully u

Re: GC tuning for Spark

2014-01-16 Thread Mark Hamstra
And, of course, there are the bigger-hammer-than-GC-tuning approaches using some combination of unchecked, off-heap and Tachyon. On Thu, Jan 16, 2014 at 11:54 AM, Tathagata Das wrote: > There are a bunch of tricks noted in the Tuning > Guide< > http://spark.incubator.apache.org/docs/latest/tuni

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)

2014-01-17 Thread Mark Hamstra
+1 LGTM On Wed, Jan 15, 2014 at 5:48 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark > (incubating) version 0.9.0. > > A draft of the release notes along with the changes file is attached > to this e-mail. > > The tag to be voted on is v0.9.0-incuba

Re: Config properties broken in master

2014-01-18 Thread Mark Hamstra
Really? Disabling config files seems to me to be a bigger/more onerous change for users than spark.speculation=true|false => spark.speculation.enabled=true|false and spark.locality.wait => spark.locality.wait.default. On Sat, Jan 18, 2014 at 11:36 AM, Matei Zaharia wrote: > This is definitely a

Re: Config properties broken in master

2014-01-18 Thread Mark Hamstra
m not actually sure it’s a feature we want to > support, compared to say just a SparkConf.fromFile method that reads a Java > Properties file. > > Matei > > On Jan 18, 2014, at 12:01 PM, Mark Hamstra > wrote: > > > Really? Disabling config files seems to

Re: Config properties broken in master

2014-01-18 Thread Mark Hamstra
the next major > release. > > On Sat, Jan 18, 2014 at 12:17 PM, Mark Hamstra > wrote: > > That later release should be at least 0.10.0, then, since use of config > > files won't be backward compatible with 0.9.0. > > > > > > On Sat, Jan 18, 2014 at 12

Re: Config properties broken in master

2014-01-18 Thread Mark Hamstra
Hah! Stupid English language -- by "fixed" I mean established/stabilized, not repaired. On Sat, Jan 18, 2014 at 12:42 PM, Mark Hamstra wrote: > Yeah, I can get on board with that -- gives us another chance to > re-think/re-work config files to address the limitations Matei me

Re: [DISCUSS] Graduating as a TLP

2014-01-23 Thread Mark Hamstra
+1 What are the requirements and responsibilities of the VP? (Or a link to where they are laid out.) On Thu, Jan 23, 2014 at 3:34 PM, Patrick Wendell wrote: > +1 > > On Thu, Jan 23, 2014 at 3:23 PM, Sean McNamara > wrote: > > +1 > > > > On 1/23/14, 4:21 PM, "Tathagata Das" > wrote: > > > >>

Re: [DISCUSS] Graduating as a TLP

2014-01-23 Thread Mark Hamstra
Is there any other choice that makes any kind of sense? Matei for VP: +1. On Thu, Jan 23, 2014 at 4:32 PM, Andy Konwinski wrote: > +2 (1 for graduating + 1 for matei as VP)! > > > On Thu, Jan 23, 2014 at 4:11 PM, Chris Mattmann > wrote: > > > +1 from me. > > > > I'll throw Matei's name into th

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-25 Thread Mark Hamstra
+1 On Sat, Jan 25, 2014 at 2:37 PM, Andy Konwinski wrote: > +1 > > > On Sat, Jan 25, 2014 at 2:27 PM, Reynold Xin wrote: > > > +1 > > > > > On Jan 25, 2014, at 12:07 PM, Hossein wrote: > > > > > > +1 > > > > > > Compiled and tested on Mavericks. > > > > > > --Hossein > > > > > > > > > On Sat,

Re: GroupByKey implementation.

2014-01-26 Thread Mark Hamstra
groupByKey does merge the values associated with the same key in different partitions: scala> val rdd = sc.parallelize(List(1, 1, 1, 1), 4).mapPartitionsWithIndex((idx, itr) => List(("foo", idx -> math.random),("bar", idx -> math.random)).toIterator) scala> rdd.collect.foreach(println) (foo,(0,0

Re: GroupByKey implementation.

2014-01-26 Thread Mark Hamstra
, Archit Thakur wrote: > Which spark version are you on? > > > On Mon, Jan 27, 2014 at 3:12 AM, Mark Hamstra >wrote: > > > groupByKey does merge the values associated with the same key in > different > > partitions: > > > > scala> val rdd = sc.parall

Re: Moving to Typesafe Config?

2014-01-27 Thread Mark Hamstra
Been done and undone, and will probably be redone for 1.0. See https://mail.google.com/mail/ca/u/0/#search/config/143a6c39e3995882 On Mon, Jan 27, 2014 at 7:58 AM, Heiko Braun wrote: > > Is there any interest in moving to a more structured approach for > configuring spark components? I.e. movin

Re: Moving to Typesafe Config?

2014-01-27 Thread Mark Hamstra
And it would be more helpful if I gave you a usable link http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html Sent from my iPhone > On Jan 27, 2014, at 8:13 AM, Heiko Braun wrote: > > Thanks Mark. > >> On 27 Jan 2014, at 1

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Mark Hamstra
Looks good. One question and one comment: How are Alpha components and higher level libraries which may add small features within a maintenance release going to be marked with that status? Somehow/somewhere within the code itself, as just as some kind of external reference? I would strongly enc

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Mark Hamstra
Yup, the intended merge level is just a hint, the responsibility still lies with the committers. It can be a helpful hint, though. On Wed, Feb 5, 2014 at 4:55 PM, Patrick Wendell wrote: > > How are Alpha components and higher level libraries which may add small > > features within a maintenanc

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Mark Hamstra
Imran: > Its also fine with me if 1.0 is next, I just think that we ought to be > asking these kinds of questions up and down the entire api before we > release 1.0. And moving master to 1.0.0-SNAPSHOT doesn't preclude that. If anything, it turns that "ought to" into "must" -- which is another

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Mark Hamstra
I'm not sure that that is the conclusion that I would draw from the Hadoop example. I would certainly agree that maintaining and supporting both an old and a new API is a cause of endless confusion for users. If we are going to change or drop things from the API to reach 1.0, then we shouldn't be

Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Mark Hamstra
I know that it can be done -- which is different from saying that I know how to set it up. > On Feb 8, 2014, at 2:57 PM, Henry Saputra wrote: > > Patrick, do you know if there is a way to check if a Github PR's > subject/ title contains JIRA number and will raise warning by the > Jenkins? > >

Re: proposal: replace lift-json with spray-json

2014-02-09 Thread Mark Hamstra
The JSON handling in the Rapture I/O library is also pretty interesting, but I have no idea what its performance now is or is likely to be, and code maturity is an issue with this project. On Sun, Feb 9, 2014 at 3:06 PM, Pascal Voitot Dev < pascal.voitot@gmail.

Re: [VOTE] Graduation of Apache Spark from the Incubator

2014-02-10 Thread Mark Hamstra
e Spark Project; and be it further > RESOLVED, that the persons listed immediately below be and > hereby are appointed to serve as the initial members of the > Apache Spark Project: > > * Mosharaf Chowdhury > * Jason Dai > * Tathagata Das > * Ankur Dave > * Aaron Dav

Re: Proposal for JIRA and Pull Request Policy

2014-02-11 Thread Mark Hamstra
Whether that is a good idea or not depends largely, I think, on who will have control of that putative Apache Jenkins. The AMPLab-now-mostly-Databricks guys (i.e. Andy and others) who did the work of setting up, configuring and maintaining the current Jenkins have done a great job and have been ve

Re: [GitHub] incubator-spark pull request: SPARK-1078: Replace lift-json with j...

2014-02-11 Thread Mark Hamstra
> > The situation sounds fine for the next minor release... I don't understand what you mean by this. According to my current understanding, the next release of Spark other than maintenance releases on 0.9.x is intended to be a major release, 1.0.0, and there are no plans for an intervening mino

Re: [GitHub] incubator-spark pull request: SPARK-1078: Replace lift-json with j...

2014-02-11 Thread Mark Hamstra
Sounds good, now that we are all clear on what we mean. Didn't mean to be a dick, just was a little confused on what you meant. On Tue, Feb 11, 2014 at 8:08 PM, Patrick Wendell wrote: > I think Aaron just meant 1.0.0 by "the next minor release". > > On Tue, Feb 1

Re: Proposal: Clarifying minor points of Scala style

2014-02-12 Thread Mark Hamstra
It's actually a little more complicated than that, mostly due to the difference between private and private[this]. Allow me to demonstrate: package dummy class Foo1(a: Int, b: Int) { private val c = a + b } class Foo2(a: Int, b: Int) { private[this] val c = a + b } class Foo3(a: Int, b: In

Re: Proposal: Clarifying minor points of Scala style

2014-02-12 Thread Mark Hamstra
the Scala private getter won't affect serialization and makes > the shadowing explicit. Other than some weird uses of reflection, I'm not > sure how the difference could cause an issue. > > > On Wed, Feb 12, 2014 at 2:02 PM, Mark Hamstra >wrote: > > > It

Re: Proposal: Clarifying minor points of Scala style

2014-02-12 Thread Mark Hamstra
nt to avoid fields secretly > springing into existence or we want the significantly more concise syntax > of unannotated parameters. If we do want the former, then "private" versus > "private[this]" is the next question. > > > On Wed, Feb 12, 2014 at 2:34 PM, Mark H

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-20 Thread Mark Hamstra
Is dropping Maven an option, or must we have it to comply with the Apache release process? On Thu, Feb 20, 2014 at 8:03 PM, Patrick Wendell wrote: > Hey All, > > It's very high overhead having two build systems in Spark. Before > getting into a long discussion about the merits of sbt vs maven,

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-21 Thread Mark Hamstra
As long as the SBT build doesn't start depending on some new functionality that doesn't have an easy analog in Maven, the canonical build being done only via SBT doesn't make too much difference to me. Regardless, I'm going to need to continue to support customized builds that fit into my Maven-iz

Re: [DISCUSS] Extending public API

2014-02-22 Thread Mark Hamstra
I'm also curious what the vetting process will be for this spark-contrib code? Does inclusion in spark-contrib mean that it has received some sort of review and official blessing, or is contrib just a dumping ground for code of questionable quality, utility, maintenance, etc.? On Sat, Feb 22, 20