Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Patrick Wendell
for a smarter fat jar plugin. -Evan To be free is not merely to cast off one's chains, but to live in a way that respects enhances the freedom of others. (#NelsonMandela) On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan mri...@gmail.com wrote: On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-28 Thread Patrick Wendell
Muralidharan mri...@gmail.com wrote: On Feb 26, 2014 11:12 PM, Patrick Wendell pwend...@gmail.com wrote: @mridul - As far as I know both Maven and Sbt use fairly similar processes for building the assembly/uber jar. We actually used to package spark with sbt and there were no specific issues

Updated Developer Docs

2014-03-04 Thread Patrick Wendell
Hey All, Just a heads up that there are a bunch of updated developer docs on the wiki including posting the dates around the current merge window. Some of the new docs might be useful for developers/committers: https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage Cheers, - Patrick

Re: Spark 0.9.0 and log4j

2014-03-08 Thread Patrick Wendell
Evan I actually remembered that Paul Brown (who also reported this issue) tested it and found that it worked. I'm going to merge this into master and branch 0.9, so please give it a spin when you have a chance. - Patrick On Sat, Mar 8, 2014 at 2:00 PM, Patrick Wendell pwend...@gmail.com wrote

Re: 0.9.0 forces log4j usage

2014-03-08 Thread Patrick Wendell
The fix for this was just merged into branch 0.9 (will be in 0.9.1+) and master. On Sun, Feb 9, 2014 at 11:44 PM, Patrick Wendell pwend...@gmail.com wrote: Thanks Paul - it isn't mean to be a full solution but just a fix for the 0.9 branch - for the full solution there is another PR by Sean

Help vote for Spark talks at the Hadoop Summit

2014-03-13 Thread Patrick Wendell
Hey All, The Hadoop Summit uses community choice voting to decide which talks to feature. It would be great if the community could help vote for Spark talks so that Spark has a good showing at this event. You can make three votes on each track. Below I've listed Spark talks in each of the tracks

Github reviews now going to separate reviews@ mailing list

2014-03-16 Thread Patrick Wendell
Hey All, We've created a new list called revi...@spark.apache.org which will contain the contents from the github pull requests and comments. Note that these e-mails will no longer appear on the dev list. Thanks to Apache Infra for helping us set this up. To subscribe to this e-mail:

Re: repositories for spark jars

2014-03-17 Thread Patrick Wendell
Hey Nathan, I don't think this would be possible because there are at least dozens of permutations of Hadoop versions (different vendor distros X different versions X YARN vs not YARN, etc) and maybe hundreds. So publishing new artifacts for each would be really difficult. What is the exact

Re: Announcing the official Spark Job Server repo

2014-03-19 Thread Patrick Wendell
Evan - yep definitely open a JIRA. It would be nice to have a contrib repo set-up for the 1.0 release. On Tue, Mar 18, 2014 at 11:28 PM, Evan Chan e...@ooyala.com wrote: Matei, Maybe it's time to explore the spark-contrib idea again? Should I start a JIRA ticket? -Evan On Tue, Mar 18,

Re: Spark 0.9.1 release

2014-03-20 Thread Patrick Wendell
Hey Tom, I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on YARN - JIRA and [SPARK-1051] On Yarn, executors don't doAs as submitting user - JIRA in. The pyspark one I would consider more of an enhancement so might not be appropriate for a point release. Someone

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell
Hey Evan and TD, Spark's dependency graph in a maintenance release seems potentially harmful, especially upgrading a minor version (not just a patch version) like this. This could affect other downstream users. For instance, now without knowing their fastutil dependency gets bumped and they hit

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell
Spark's dependency graph in a maintenance *Modifying* Spark's dependency graph...

Re: Travis CI

2014-03-25 Thread Patrick Wendell
That's not correct - like Michael said the Jenkins build remains the reference build for now. On Tue, Mar 25, 2014 at 7:03 PM, Nan Zhu zhunanmcg...@gmail.com wrote: I assume the Jenkins is not working now? Best, -- Nan Zhu On Tuesday, March 25, 2014 at 6:42 PM, Michael Armbrust wrote:

Re: Travis CI

2014-03-25 Thread Patrick Wendell
just found that the Jenkins is not working from this afternoon for one PR, the first time build failed after 90 minutes, the second time it has run for more than 2 hours, no result is returned Best, -- Nan Zhu On Tuesday, March 25, 2014 at 10:06 PM, Patrick Wendell wrote: That's

Re: Spark 0.9.1 release

2014-03-26 Thread Patrick Wendell
Hey TD, This one we just merged into master this morning: https://spark-project.atlassian.net/browse/SPARK-1322 It should definitely go into the 0.9 branch because there was a bug in the semantics of top() which at this point is unreleased in Python. I didn't backport it yet because I figured

Re: JIRA. github and asf updates

2014-03-29 Thread Patrick Wendell
Mridul, You can unsubscribe yourself from any of these sources, right? - Patrick On Sat, Mar 29, 2014 at 11:05 AM, Mridul Muralidharan mri...@gmail.comwrote: Hi, So we are now receiving updates from three sources for each change to the PR. While each of them handles a corner case which

Could you undo the JIRA dev list e-mails?

2014-03-29 Thread Patrick Wendell
Hey Chris, I don't think our JIRA has been fully migrated to Apache infra, so it's really confusing to send people e-mails referring to the new JIRA since we haven't announced it yet. There is some content there because we've been trying to do the migration, but I'm not sure it's entirely

Re: Could you undo the JIRA dev list e-mails?

2014-03-29 Thread Patrick Wendell
Okay I think I managed to revert this by just removing jira@a.o from our dev list. On Sat, Mar 29, 2014 at 11:37 AM, Patrick Wendell pwend...@gmail.comwrote: Hey Chris, I don't think our JIRA has been fully migrated to Apache infra, so it's really confusing to send people e-mails referring

Re: JIRA. github and asf updates

2014-03-29 Thread Patrick Wendell
problem to have - a vibrant and very actively engaged community generated a lot of meaningful traffic ! I just dont want to get distracted from it by repetitions. Regards, Mridul On Sat, Mar 29, 2014 at 11:46 PM, Patrick Wendell pwend...@gmail.com wrote: Ah sorry I see - Jira updates are going

Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-30 Thread Patrick Wendell
TD - I downloaded and did some local testing. Looks good to me! +1 You should cast your own vote - at that point it's enough to pass. - Patrick On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com wrote: +1 tested on Ubuntu12.04 64bit On Mon, Mar 31, 2014 at 3:56 AM, Matei

Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-31 Thread Patrick Wendell
3 days, it makes it really hard for anyone who is offline for the weekend to try it out. Either that or extend the voting for more then 3 days. Tom On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com wrote: TD - I downloaded and did some local testing. Looks good to me

Re: sbt-package-bin

2014-04-01 Thread Patrick Wendell
And there is a deb target as well - ah didn't see Mark's email. On Tue, Apr 1, 2014 at 11:36 AM, Patrick Wendell pwend...@gmail.com wrote: Ya there is already some fragmentation here. Maven has some dist targets and there is also ./make-distribution.sh. On Tue, Apr 1, 2014 at 11:31 AM

Re: sbt-package-bin

2014-04-01 Thread Patrick Wendell
Ya there is already some fragmentation here. Maven has some dist targets and there is also ./make-distribution.sh. On Tue, Apr 1, 2014 at 11:31 AM, Mark Hamstra m...@clearstorydata.comwrote: A basic Debian package can already be created from the Maven build: mvn -Pdeb ... On Tue, Apr 1,

Re: Would anyone mind having a quick look at PR#288?

2014-04-02 Thread Patrick Wendell
Hey Evan, Ya thanks this is a pretty small patch. Should definitely be do-able for 1.0. - Patrick On Wed, Apr 2, 2014 at 10:25 AM, Evan Chan e...@ooyala.com wrote: https://github.com/apache/spark/pull/288 It's for fixing SPARK-1154, which would help Spark be a better citizen for most

Re: Recent heartbeats

2014-04-04 Thread Patrick Wendell
I answered this over on the user list... On Fri, Apr 4, 2014 at 6:13 PM, Debasish Das debasish.da...@gmail.comwrote: Hi, Also posted it on user but then I realized it might be more involved. In my ALS runs I am noticing messages that complain about heart beats: 14/04/04 20:43:09 WARN

Re: Flaky streaming tests

2014-04-07 Thread Patrick Wendell
TD - do you know what is going on here? I looked into this ab it and at least a few of these that use Thread.sleep() and assume the sleep will be exact, which is wrong. We should disable all the tests that do and probably they should be re-written to virtualize time. - Patrick On Mon, Apr 7,

Re: It seems that jenkins for PR is not working

2014-04-15 Thread Patrick Wendell
There are a few things going on here wrt tests. 1. I fixed up the RAT issues with a hotfix. 2. The Hive tests were actually disabled for a while accidentally. A recent fix correctly re-enabled them. Without Hive Spark tests run in about 40 minutes and with Hive it runs in 1 hour and 15 minutes,

Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
Hey All, This is not an official vote, but I wanted to cut an RC so that people can test against the Maven artifacts, test building with their configuration, etc. We are still chasing down a few issues and updating docs, etc. If you have issues or bug reports for this release, please send an

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
What are the expectations / guarantees on binary compatibility between 0.9 and 1.0? There are not guarantees.

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
) at spark.activator.WordCount2$.main(WordCount2.scala:42) at spark.activator.WordCount2.main(WordCount2.scala) ... Thoughts? On Tue, Apr 29, 2014 at 3:05 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, This is not an official vote, but I wanted to cut an RC so that people can test

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
actually exist? Thanks, Dean On Tue, Apr 29, 2014 at 1:54 PM, Patrick Wendell pwend...@gmail.com wrote: Hi Dean, We always used the Hadoop libraries here to read and write local files. In Spark 1.0 we started enforcing the rule that you can't over-write an existing directory because it can

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
I added a fix for this recently and it didn't require adding -J notation - are you trying it with this patch? https://issues.apache.org/jira/browse/SPARK-1654 ./bin/spark-shell --driver-java-options -Dfoo=a -Dbar=b scala sys.props.get(foo) res0: Option[String] = Some(a) scala sys.props.get(bar)

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell pwend...@gmail.com wrote: I added a fix for this recently and it didn't require adding -J notation - are you trying it with this patch? https://issues.apache.org/jira/browse/SPARK-1654 ./bin/spark-shell --driver-java-options -Dfoo=a -Dbar

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
it. Pretty sure we aren't passing these argument arrays around correctly in bash. On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem is that the spark-submit script doesn't pass

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
org.apache.spark.deploy.SparkSubmit $ORIG_ARGS +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit ${ORIG_ARGS[@]} On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote: So I reproduced the problem here: == test.sh == #!/bin/bash for x in $@; do echo arg: $x done

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
fi -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit ${ORIG_ARGS[@]} On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote: So I reproduced the problem here: == test.sh

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Patch here: https://github.com/apache/spark/pull/609 On Wed, Apr 30, 2014 at 2:26 PM, Patrick Wendell pwend...@gmail.com wrote: Dean - our e-mails crossed, but thanks for the tip. Was independently arriving at your solution :) Okay I'll submit something. - Patrick On Wed, Apr 30, 2014

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-15 Thread Patrick Wendell
I'm cancelling this vote in favor of rc6. On Tue, May 13, 2014 at 8:01 AM, Sean Owen so...@cloudera.com wrote: On Tue, May 13, 2014 at 2:49 PM, Sean Owen so...@cloudera.com wrote: On Tue, May 13, 2014 at 9:36 AM, Patrick Wendell pwend...@gmail.com wrote: The release files, including signatures

[VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-15 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.0! This patch has a few minor fixes on top of rc5. I've also built the binary artifacts with Hive support enabled so people can test this configuration. When we release 1.0 we might just release both vanilla and

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
I'll start the voting with a +1. On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This patch has minor documentation changes and fixes on top of rc6. The tag to be voted on is v1.0.0

[RESULT][VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Patrick Wendell
This vote is cancelled in favor of rc7. On Wed, May 14, 2014 at 1:02 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This patch has a few minor fixes on top of rc5. I've also built the binary artifacts with Hive

[VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-16 Thread Patrick Wendell
[Due to ASF e-mail outage, I'm not if anyone will actually receive this.] Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has only minor changes on top of rc7. The tag to be voted on is v1.0.0-rc8 (commit 80eea0f):

[VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.0! This patch has minor documentation changes and fixes on top of rc6. The tag to be voted on is v1.0.0-rc7 (commit 9212b3e):

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
...@gmail.com wrote: Hi Patrick, Just want to make sure that VOTE for rc6 also cancelled? Thanks, Henry On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell pwend...@gmail.com wrote: I'll start the voting with a +1. On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell pwend

[RESULT] [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-17 Thread Patrick Wendell
Cancelled in favor of rc9. On Sat, May 17, 2014 at 12:51 AM, Patrick Wendell pwend...@gmail.com wrote: Due to the issue discovered by Michael, this vote is cancelled in favor of rc9. On Fri, May 16, 2014 at 6:22 PM, Michael Armbrust mich...@databricks.com wrote: -1 We found a regression

Re: [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-17 Thread Patrick Wendell
://github.com/apache/spark/pull/808 Michael On Fri, May 16, 2014 at 3:57 PM, Mark Hamstra m...@clearstorydata.com wrote: +1 On Fri, May 16, 2014 at 2:16 AM, Patrick Wendell pwend...@gmail.com wrote: [Due to ASF e-mail outage, I'm not if anyone will actually receive this.] Please

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Patrick Wendell
I'll start the voting with a +1. On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has one bug fix and one minor feature on top of rc8: SPARK-1864: https://github.com/apache/spark

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Patrick Wendell
@db - it's possible that you aren't including the jar in the classpath of your driver program (I think this is what mridul was suggesting). It would be helpful to see the stack trace of the CNFE. - Patrick On Sun, May 18, 2014 at 11:54 AM, Patrick Wendell pwend...@gmail.com wrote: @xiangrui

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Patrick Wendell
@xiangrui - we don't expect these to be present on the system classpath, because they get dynamically added by Spark (e.g. your application can call sc.addJar well after the JVM's have started). @db - I'm pretty surprised to see that behavior. It's definitely not intended that users need

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-18 Thread Patrick Wendell
, May 17, 2014 at 12:58 AM, Patrick Wendell pwend...@gmail.com wrote: I'll start the voting with a +1. On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has one bug fix and one

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Patrick Wendell
Having a user add define a custom class inside of an added jar and instantiate it directly inside of an executor is definitely supported in Spark and has been for a really long time (several years). This is something we do all the time in Spark. DB - I'd hold off on a re-architecting of this

Re: spark 1.0 standalone application

2014-05-19 Thread Patrick Wendell
Whenever we publish a release candidate, we create a temporary maven repository that host the artifacts. We do this precisely for the case you are running into (where a user wants to build an application against it to test). You can build against the release candidate by just adding that

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-19 Thread Patrick Wendell
+1 -- Nan Zhu On Sunday, May 18, 2014 at 11:07 PM, witgo wrote: How to reproduce this bug? -- Original -- From: Patrick Wendell;pwend...@gmail.com (mailto:pwend...@gmail.com); Date: Mon, May 19, 2014 10:08 AM To: dev@spark.apache.org (mailto:dev

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Patrick Wendell
standalone, A and B are both loaded by the custom classloader, so this issue doesn't come up. -Sandy On Mon, May 19, 2014 at 7:07 PM, Patrick Wendell pwend...@gmail.com wrote: Having a user add define a custom class inside of an added jar and instantiate

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Patrick Wendell
at 3:15 PM, Patrick Wendell pwend...@gmail.com wrote: Of these two solutions I'd definitely prefer 2 in the short term. I'd imagine the fix is very straightforward (it would mostly just be remove code), and we'd be making this more consistent with the standalone mode which makes things way easier

Re: No output from Spark Streaming program with Spark 1.0

2014-05-23 Thread Patrick Wendell
Also one other thing to try, try removing all of the logic form inside of foreach and just printing something. It could be that somehow an exception is being triggered inside of your foreach block and as a result the output goes away. On Fri, May 23, 2014 at 6:00 PM, Patrick Wendell pwend

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-26 Thread Patrick Wendell
Hey Ankur, That does seem like a good fix, but right now we are only blocking the release on major regressions that affect all components. So I don't think this is sufficient to block it from going forward and cutting a new candidate. This is because we are in the very late stage of the release.

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-29 Thread Patrick Wendell
+1 I spun up a few EC2 clusters and ran my normal audit checks. Tests passing, sigs, CHANGES and NOTICE look good Thanks TD for helping cut this RC! On Wed, May 28, 2014 at 9:38 PM, Kevin Markey kevin.mar...@oracle.com wrote: +1 Built -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 Ran current

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-29 Thread Patrick Wendell
[tl;dr stable API's are important - sorry, this is slightly meandering] Hey - just wanted to chime in on this as I was travelling. Sean, you bring up great points here about the velocity and stability of Spark. Many projects have fairly customized semantics around what versions actually mean

Announcing Spark 1.0.0

2014-05-30 Thread Patrick Wendell
I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever, with contributions from 117 developers. I'd like to thank

Re: Streaming example stops outputting (Java, Kafka at least)

2014-05-30 Thread Patrick Wendell
Yeah - Spark streaming needs at least two threads to run. I actually thought we warned the user if they only use one (@tdas?) but the warning might not be working correctly - or I'm misremembering. On Fri, May 30, 2014 at 6:38 AM, Sean Owen so...@cloudera.com wrote: Thanks Nan, that does appear

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Patrick Wendell
Hey guys, thanks for the insights. Also, I realize Hadoop has gotten way better about this with 2.2+ and I think it's great progress. We have well defined API levels in Spark and also automated checking of API violations for new pull requests. When doing code reviews we always enforce the

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Patrick Wendell
what about all it's dependencies. Anyways just some things to consider... simplifying our classpath is definitely an avenue worth exploring! On Fri, May 30, 2014 at 2:56 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: On Fri, May 30, 2014 at 2:11 PM, Patrick Wendell pwend...@gmail.com wrote: Hey

Re: Unable to execute saveAsTextFile on multi node mesos

2014-05-31 Thread Patrick Wendell
Can you look at the logs from the executor or in the UI? They should give an exception with the reason for the task failure. Also in the future, for this type of e-mail please only e-mail the user@ list and not both lists. - Patrick On Sat, May 31, 2014 at 3:22 AM, prabeesh k

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-31 Thread Patrick Wendell
our API's and see if we ever do this. On Fri, May 30, 2014 at 10:54 PM, Patrick Wendell pwend...@gmail.com wrote: Spark is a bit different than Hadoop MapReduce, so maybe that's a source of some confusion. Spark is often used as a substrate for building different types of analytics applications

Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build

2014-06-01 Thread Patrick Wendell
This is a false error message actually - the Maven build no longer requires SCALA_HOME but the message/check was still there. This was fixed recently in master: https://github.com/apache/spark/commit/d8c005d5371f81a2a06c5d27c7021e1ae43d7193 I can back port that fix into branch-1.0 so it will be

Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build

2014-06-01 Thread Patrick Wendell
, 2014 at 11:13 AM, Patrick Wendell pwend...@gmail.com wrote: This is a false error message actually - the Maven build no longer requires SCALA_HOME but the message/check was still there. This was fixed recently in master: https://github.com/apache/spark/commit

Re: Which version does the binary compatibility test against by default?

2014-06-02 Thread Patrick Wendell
Yeah - check out sparkPreviousArtifact in the build: https://github.com/apache/spark/blob/master/project/SparkBuild.scala#L325 - Patrick On Mon, Jun 2, 2014 at 5:30 PM, Xiangrui Meng men...@gmail.com wrote: Is there a way to specify the target version? -Xiangrui

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-06-04 Thread Patrick Wendell
Received! On Wed, Jun 4, 2014 at 10:47 AM, Tom Graves tgraves...@yahoo.com.invalid wrote: Testing... Resending as it appears my message didn't go through last week. Tom On Wednesday, May 28, 2014 4:12 PM, Tom Graves tgraves...@yahoo.com wrote: +1. Tested spark on yarn (cluster mode,

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-06-04 Thread Patrick Wendell
to get the 1.0.0 stable release from github to deploy on our production cluster ? Is there a tag for 1.0.0 that I should use to deploy ? Thanks. Deb On Wed, Jun 4, 2014 at 10:49 AM, Patrick Wendell pwend...@gmail.com wrote: Received! On Wed, Jun 4, 2014 at 10:47 AM, Tom Graves tgraves

Re: Announcing Spark 1.0.0

2014-06-04 Thread Patrick Wendell
/05/14 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever

MIMA Compatiblity Checks

2014-06-08 Thread Patrick Wendell
Hey All, Some people may have noticed PR failures due to binary compatibility checks. We've had these enabled in several of the sub-modules since the 0.9.0 release but we've turned them on in Spark core post 1.0.0 which has much higher churn. The checks are based on the migration manager tool

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-08 Thread Patrick Wendell
Paul, Could you give the version of Java that you are building with and the version of Java you are running with? Are they the same? Just off the cuff, I wonder if this is related to: https://issues.apache.org/jira/browse/SPARK-1520 If it is, it could appear that certain functions are not in

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-08 Thread Patrick Wendell
Also I should add - thanks for taking time to help narrow this down! On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell pwend...@gmail.com wrote: Paul, Could you give the version of Java that you are building with and the version of Java you are running with? Are they the same? Just off

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-08 Thread Patrick Wendell
12:05 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class 1560 06-08-14 12:05 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class Best. -- Paul — p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell pwend

Emergency maintenace on jenkins

2014-06-09 Thread Patrick Wendell
Just a heads up - due to an outage at UCB we've lost several of the Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to compensate, but this might fail some ongoing builds. The good news is if we do get it working with EC2 workers, then we will have burst capability in the future

Re: Emergency maintenace on jenkins

2014-06-10 Thread Patrick Wendell
No luck with this tonight - unfortunately our Python tests aren't working well with Python 2.6 and some other issues made it hard to get the EC2 worker up to speed. Hopefully we can have this up and running tomororw. - Patrick On Mon, Jun 9, 2014 at 10:17 PM, Patrick Wendell pwend...@gmail.com

Re: Emergency maintenace on jenkins

2014-06-10 Thread Patrick Wendell
Hey just to update people - as of around 1pm PT we were back up and running with Jenkins slaves on EC2. Sorry about the disruption. - Patrick On Tue, Jun 10, 2014 at 1:15 AM, Patrick Wendell pwend...@gmail.com wrote: No luck with this tonight - unfortunately our Python tests aren't working

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-17 Thread Patrick Wendell
Out of curiosity - are you guys using speculation, shuffle consolidation, or any other non-default option? If so that would help narrow down what's causing this corruption. On Tue, Jun 17, 2014 at 10:40 AM, Surendranauth Hiraman suren.hira...@velos.io wrote: Matt/Ryan, Did you make any headway

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Patrick Wendell
) // block manager conf.set(spark.storage.blockManagerTimeoutIntervalMs, 18) conf.set(spark.blockManagerHeartBeatMs, 8) -Suren On Wed, Jun 18, 2014 at 1:42 AM, Patrick Wendell pwend...@gmail.com wrote: Out of curiosity - are you guys using speculation, shuffle

Re: Scala examples for Spark do not work as written in documentation

2014-06-20 Thread Patrick Wendell
Those are pretty old - but I think the reason Matei did that was to make it less confusing for brand new users. `spark` is actually a valid identifier because it's just a variable name (val spark = new SparkContext()) but I agree this could be confusing for users who want to drop into the shell.

Assorted project updates (tests, build, etc)

2014-06-22 Thread Patrick Wendell
Hey All, 1. The original test infrastructure hosted by the AMPLab has been fully restored and also expanded with many more executor slots for tests. Thanks to Matt Massie at the Amplab for helping with this. 2. We now have a nightly build matrix across different Hadoop versions. It appears that

[VOTE] Release Apache Spark 1.0.1 (RC1)

2014-06-26 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.1! The tag to be voted on is v1.0.1-rc1 (commit 7feeda3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7feeda3d729f9397aa15ee8750c01ef5aa601962 The release files, including signatures, digests, etc.

Re: Errors from Sbt Test

2014-07-01 Thread Patrick Wendell
Do those also happen if you run other hadoop versions (e.g. try 1.0.4)? On Tue, Jul 1, 2014 at 1:00 AM, Taka Shinagawa taka.epsi...@gmail.com wrote: Since Spark 1.0.0, I've been seeing multiple errors when running sbt test. I ran the following commands from Spark 1.0.1 RC1 on Mac OSX 10.9.2.

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-01 Thread Patrick Wendell
Yeah I created a JIRA a while back to piggy-back the map status info on top of the task (I honestly think it will be a small change). There isn't a good reason to broadcast the entire array and it can be an issue during large shuffles. - Patrick On Mon, Jun 30, 2014 at 7:58 PM, Aaron Davidson

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-01 Thread Patrick Wendell
b) Instead of pulling this information, push it to executors as part of task submission. (What Patrick mentioned ?) (1) a.1 from above is still an issue for this. I don't understand problem a.1 is. In this case, we don't need to do caching, right? (2) Serialized task size is also a concern :

[RESULT] [VOTE] Release Apache Spark 1.0.1 (RC1)

2014-07-04 Thread Patrick Wendell
port). Is this something the Spark team would consider merging into 1.0.1? Thanks! Andrew On Sun, Jun 29, 2014 at 10:54 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, We're going to move onto another rc because of this vote. Unfortunately with the summit

[VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-04 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.1! The tag to be voted on is v1.0.1-rc1 (commit 7d1043c): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7d1043c99303b87aef8ee19873629c2bfba4cc78 The release files, including signatures, digests, etc.

Testing period for better jenkins integration

2014-07-09 Thread Patrick Wendell
Just a heads up - I've added some better Jenkins integration that posts more useful messages on pull requests. We'll run this side-by-side with the current Jenkins messages for a while to make sure it's working well. Things may be a bit chatty while we are testing this - we can migrate over as

Changes to sbt build have been merged

2014-07-10 Thread Patrick Wendell
Just a heads up, we merged Prashant's work on having the sbt build read all dependencies from Maven. Please report any issues you find on the dev list or on JIRA. One note here for developers, going forward the sbt build will use the same configuration style as the maven build (-D for options and

Re: what is the difference between org.spark-project.hive and org.apache.hadoop.hive

2014-07-11 Thread Patrick Wendell
There are two differences: 1. We publish hive with a shaded protobuf dependency to avoid conflicts with some Hadoop versions. 2. We publish a proper hive-exec jar that only includes hive packages. The upstream version of hive-exec bundles a bunch of other random dependencies in it which makes it

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-11 Thread Patrick Wendell
with your application. But on the other hand the release does fix some critical bugs that affect all users. We can always do 1.0.2 later if we discover a problem. Matei On Jul 10, 2014, at 9:40 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Gary, The vote technically doesn't close until I

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-11 Thread Patrick Wendell
Okay just FYI - I'm closing this vote since many people are waiting on the release and I was hoping to package it today. If we find a reproducible Mesos issue here, we can definitely spin the fix into a subsequent release. On Fri, Jul 11, 2014 at 9:37 AM, Patrick Wendell pwend...@gmail.com

[RESULT] [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-11 Thread Patrick Wendell
This vote has passed with 9 +1 votes (5 binding) and 1 -1 vote (0 binding). +1: Patrick Wendell* Mark Hamstra* DB Tsai Krishna Sankar Soren Macbeth Andrew Or Matei Zaharia* Xiangrui Meng* Tom Graves* 0: -1: Gary Malouf

Announcing Spark 1.0.1

2014-07-11 Thread Patrick Wendell
I am happy to announce the availability of Spark 1.0.1! This release includes contributions from 70 developers. Spark 1.0.0 includes fixes across several areas of Spark, including the core API, PySpark, and MLlib. It also includes new features in Spark's (alpha) SQL library, including support for

Re: how to run the program compiled with spark 1.0.0 in the branch-0.1-jdbc cluster

2014-07-14 Thread Patrick Wendell
1. The first error I met is the different SerializationVersionUID in ExecuterStatus I resolved by explicitly declare SerializationVersionUID in ExecuterStatus.scala and recompile branch-0.1-jdbc I don't think there is a class in Spark named ExecuterStatus (sic) ... or ExecutorStatus. Is

Re: Reproducible deadlock in 1.0.1, possibly related to Spark-1097

2014-07-14 Thread Patrick Wendell
Hey Cody, This Jstack seems truncated, would you mind giving the entire stack trace? For the second thread, for instance, we can't see where the lock is being acquired. - Patrick On Mon, Jul 14, 2014 at 1:42 PM, Cody Koeninger cody.koenin...@mediacrossing.com wrote: Hi all, just wanted to give

Re: Reproducible deadlock in 1.0.1, possibly related to Spark-1097

2014-07-14 Thread Patrick Wendell
:44 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Cody, This Jstack seems truncated, would you mind giving the entire stack trace? For the second thread, for instance, we can't see where the lock is being acquired. - Patrick On Mon, Jul 14, 2014 at 1:42 PM, Cody Koeninger

Re: Reproducible deadlock in 1.0.1, possibly related to Spark-1097

2014-07-14 Thread Patrick Wendell
synchronize conf it will deadlock but it is ok when synchronize initLocalJobConfFuncOpt Here's the entire jstack output. On Mon, Jul 14, 2014 at 4:44 PM, Patrick Wendell pwend...@gmail.com mailto:pwend...@gmail.com wrote: Hey Cody, This Jstack seems truncated, would you mind giving

Re: Catalyst dependency on Spark Core

2014-07-14 Thread Patrick Wendell
Adding new build modules is pretty high overhead, so if this is a case where a small amount of duplicated code could get rid of the dependency, that could also be a good short-term option. - Patrick On Mon, Jul 14, 2014 at 2:15 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Yeah, I'd just add

  1   2   3   4   5   6   >