Assorted project updates (tests, build, etc)

2014-06-22 Thread Patrick Wendell
Hey All, 1. The original test infrastructure hosted by the AMPLab has been fully restored and also expanded with many more executor slots for tests. Thanks to Matt Massie at the Amplab for helping with this. 2. We now have a nightly build matrix across different Hadoop versions. It appears that t

Re: Scala examples for Spark do not work as written in documentation

2014-06-20 Thread Patrick Wendell
Those are pretty old - but I think the reason Matei did that was to make it less confusing for brand new users. `spark` is actually a valid identifier because it's just a variable name (val spark = new SparkContext()) but I agree this could be confusing for users who want to drop into the shell. O

Re: Trailing Tasks Saving to HDFS

2014-06-19 Thread Patrick Wendell
I'll make a comment on the JIRA - thanks for reporting this, let's get to the bottom of it. On Thu, Jun 19, 2014 at 11:19 AM, Surendranauth Hiraman wrote: > I've created an issue for this but if anyone has any advice, please let me > know. > > Basically, on about 10 GBs of data, saveAsTextFile()

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Patrick Wendell
ds", "300") >> conf.set("spark.akka.timeout", "180") >> conf.set("spark.akka.frameSize", "100") >> conf.set("spark.akka.batchSize", "30") >> conf.set("spark.akka

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-17 Thread Patrick Wendell
Out of curiosity - are you guys using speculation, shuffle consolidation, or any other non-default option? If so that would help narrow down what's causing this corruption. On Tue, Jun 17, 2014 at 10:40 AM, Surendranauth Hiraman wrote: > Matt/Ryan, > > Did you make any headway on this? My team is

Re: Emergency maintenace on jenkins

2014-06-10 Thread Patrick Wendell
Hey just to update people - as of around 1pm PT we were back up and running with Jenkins slaves on EC2. Sorry about the disruption. - Patrick On Tue, Jun 10, 2014 at 1:15 AM, Patrick Wendell wrote: > No luck with this tonight - unfortunately our Python tests aren't > working well

Re: Emergency maintenace on jenkins

2014-06-10 Thread Patrick Wendell
No luck with this tonight - unfortunately our Python tests aren't working well with Python 2.6 and some other issues made it hard to get the EC2 worker up to speed. Hopefully we can have this up and running tomororw. - Patrick On Mon, Jun 9, 2014 at 10:17 PM, Patrick Wendell wrote: >

Emergency maintenace on jenkins

2014-06-09 Thread Patrick Wendell
Just a heads up - due to an outage at UCB we've lost several of the Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to compensate, but this might fail some ongoing builds. The good news is if we do get it working with EC2 workers, then we will have burst capability in the future -

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Patrick Wendell
RDD$anonfun$saveAsTextFile$2.class > > > Best. > -- Paul > > — > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ > > > On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell wrote: >> >> Paul, >> >> Could you give the version of Java t

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Patrick Wendell
Also I should add - thanks for taking time to help narrow this down! On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell wrote: > Paul, > > Could you give the version of Java that you are building with and the > version of Java you are running with? Are they the same? > > Just off

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Patrick Wendell
Paul, Could you give the version of Java that you are building with and the version of Java you are running with? Are they the same? Just off the cuff, I wonder if this is related to: https://issues.apache.org/jira/browse/SPARK-1520 If it is, it could appear that certain functions are not in the

MIMA Compatiblity Checks

2014-06-08 Thread Patrick Wendell
Hey All, Some people may have noticed PR failures due to binary compatibility checks. We've had these enabled in several of the sub-modules since the 0.9.0 release but we've turned them on in Spark core post 1.0.0 which has much higher churn. The checks are based on the "migration manager" tool f

Re: Announcing Spark 1.0.0

2014-06-04 Thread Patrick Wendell
> >>Thanks, >>Rahul Singhal >> >> >> >> >> >>On 30/05/14 3:43 PM, "Patrick Wendell" wrote: >> >>>I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 >>>is a milestone release as the first in t

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-06-04 Thread Patrick Wendell
way to get the 1.0.0 stable release from github to deploy on our > production cluster ? Is there a tag for 1.0.0 that I should use to deploy ? > > Thanks. > Deb > > > > On Wed, Jun 4, 2014 at 10:49 AM, Patrick Wendell wrote: > >> Received! >> >> On

Re: What is the correct Spark version of master/branch-1.0?

2014-06-04 Thread Patrick Wendell
It should be 1.1-SNAPSHOT. Feel free to submit a PR to clean up any inconsistencies. On Tue, Jun 3, 2014 at 8:33 PM, Takuya UESHIN wrote: > Hi all, > > I'm wondering what is the correct Spark version of each HEAD of master > and branch-1.0. > > current master HEAD (e8d93ee5284cb6a1d4551effe91ee8d

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-06-04 Thread Patrick Wendell
Received! On Wed, Jun 4, 2014 at 10:47 AM, Tom Graves wrote: > Testing... Resending as it appears my message didn't go through last week. > > Tom > > > On Wednesday, May 28, 2014 4:12 PM, Tom Graves wrote: > > > > +1. Tested spark on yarn (cluster mode, client mode, pyspark, spark-shell) on > h

Spark 1.1 Window and 1.0 Wrap-up

2014-06-02 Thread Patrick Wendell
Hey All, I wanted to announce the the Spark 1.1 release window: June 1 - Merge window opens July 25 - Cut-off for new pull requests August 1 - Merge window closes (code freeze), QA period starts August 15+ - RC's and voting This is consistent with the "3 month" release cycle we are targeting. I'd

Re: Which version does the binary compatibility test against by default?

2014-06-02 Thread Patrick Wendell
Yeah - check out sparkPreviousArtifact in the build: https://github.com/apache/spark/blob/master/project/SparkBuild.scala#L325 - Patrick On Mon, Jun 2, 2014 at 5:30 PM, Xiangrui Meng wrote: > Is there a way to specify the target version? -Xiangrui

Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build

2014-06-01 Thread Patrick Wendell
Sun, Jun 1, 2014 at 11:13 AM, Patrick Wendell wrote: > This is a false error message actually - the Maven build no longer > requires SCALA_HOME but the message/check was still there. This was > fixed recently in master: > > https://github.com/apache/spark/commit/d8c005d5371f81a2a06c5

Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build

2014-06-01 Thread Patrick Wendell
This is a false error message actually - the Maven build no longer requires SCALA_HOME but the message/check was still there. This was fixed recently in master: https://github.com/apache/spark/commit/d8c005d5371f81a2a06c5d27c7021e1ae43d7193 I can back port that fix into branch-1.0 so it will be i

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-31 Thread Patrick Wendell
e good to audit our API's and see if we ever do this. On Fri, May 30, 2014 at 10:54 PM, Patrick Wendell wrote: > Spark is a bit different than Hadoop MapReduce, so maybe that's a > source of some confusion. Spark is often used as a substrate for > building different t

Re: Unable to execute saveAsTextFile on multi node mesos

2014-05-31 Thread Patrick Wendell
Can you look at the logs from the executor or in the UI? They should give an exception with the reason for the task failure. Also in the future, for this type of e-mail please only e-mail the "user@" list and not both lists. - Patrick On Sat, May 31, 2014 at 3:22 AM, prabeesh k wrote: > Hi, > >

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Patrick Wendell
op version? Would the Hadoop dep just not be shaded... if so what about all it's dependencies. Anyways just some things to consider... simplifying our classpath is definitely an avenue worth exploring! On Fri, May 30, 2014 at 2:56 PM, Colin McCabe wrote: > On Fri, May 30, 2014 at 2:11 PM,

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Patrick Wendell
Hey guys, thanks for the insights. Also, I realize Hadoop has gotten way better about this with 2.2+ and I think it's great progress. We have well defined API levels in Spark and also automated checking of API violations for new pull requests. When doing code reviews we always enforce the narrowes

Re: Streaming example stops outputting (Java, Kafka at least)

2014-05-30 Thread Patrick Wendell
Yeah - Spark streaming needs at least two threads to run. I actually thought we warned the user if they only use one (@tdas?) but the warning might not be working correctly - or I'm misremembering. On Fri, May 30, 2014 at 6:38 AM, Sean Owen wrote: > Thanks Nan, that does appear to fix it. I was u

Announcing Spark 1.0.0

2014-05-30 Thread Patrick Wendell
I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever, with contributions from 117 developers. I'd like to thank everyon

Re: [RESULT][VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-29 Thread Patrick Wendell
ed the RC and voted. Here are the totals: > > +1: (13 votes) > Matei Zaharia* > Mark Hamstra* > Holden Karau > Nick Pentreath* > Will Benton > Henry Saputra > Sean McNamara* > Xiangrui Meng* > Andy Konwinski* > Krishna Sankar > Kevin Markey > Patrick We

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-29 Thread Patrick Wendell
[tl;dr stable API's are important - sorry, this is slightly meandering] Hey - just wanted to chime in on this as I was travelling. Sean, you bring up great points here about the velocity and stability of Spark. Many projects have fairly customized semantics around what versions actually mean (HBas

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-29 Thread Patrick Wendell
+1 I spun up a few EC2 clusters and ran my normal audit checks. Tests passing, sigs, CHANGES and NOTICE look good Thanks TD for helping cut this RC! On Wed, May 28, 2014 at 9:38 PM, Kevin Markey wrote: > +1 > > Built -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 > Ran current version of one of my

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-26 Thread Patrick Wendell
Hey Ankur, That does seem like a good fix, but right now we are only blocking the release on major regressions that affect all components. So I don't think this is sufficient to block it from going forward and cutting a new candidate. This is because we are in the very late stage of the release.

Re: all values for a key must fit in memory

2014-05-25 Thread Patrick Wendell
Nilesh - out of curiosity - what operation are you doing on the values for the key? On Sun, May 25, 2014 at 6:35 PM, Nilesh wrote: > Hi Andrew, > > Thanks for the reply! > > It's clearer about the API part now. That's what I wanted to know. > > Wow, tuples, why didn't that occur to me. That's a l

Re: No output from Spark Streaming program with Spark 1.0

2014-05-23 Thread Patrick Wendell
Also one other thing to try, try removing all of the logic form inside of foreach and just printing something. It could be that somehow an exception is being triggered inside of your foreach block and as a result the output goes away. On Fri, May 23, 2014 at 6:00 PM, Patrick Wendell wrote: >

Re: No output from Spark Streaming program with Spark 1.0

2014-05-23 Thread Patrick Wendell
Hey Jim, Do you see the same behavior if you run this outside of eclipse? Also, what happens if you print something to standard out when setting up your streams (i.e. not inside of the foreach) do you see that? This could be a streaming issue, but it could also be something related to the way it'

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Patrick Wendell
1, 2014 at 3:15 PM, Patrick Wendell wrote: > Of these two solutions I'd definitely prefer 2 in the short term. I'd > imagine the fix is very straightforward (it would mostly just be > remove code), and we'd be making this more consistent with the > standalone mode wh

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Patrick Wendell
> > >> >>> >> > >>> It just hit me why this problem is showing up on YARN and not on >>> >> > >>> standalone. >>> >> > >>> >>> >> > >>> The relevant difference betwe

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-19 Thread Patrick Wendell
> 2.3 > > +1 > > -- > Nan Zhu > > > On Sunday, May 18, 2014 at 11:07 PM, witgo wrote: > >> How to reproduce this bug? >> >> >> -- Original -- >> From: "Patrick Wendell";mailto:pwend..

Re: spark 1.0 standalone application

2014-05-19 Thread Patrick Wendell
Whenever we publish a release candidate, we create a temporary maven repository that host the artifacts. We do this precisely for the case you are running into (where a user wants to build an application against it to test). You can build against the release candidate by just adding that repositor

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Patrick Wendell
Having a user add define a custom class inside of an added jar and instantiate it directly inside of an executor is definitely supported in Spark and has been for a really long time (several years). This is something we do all the time in Spark. DB - I'd hold off on a re-architecting of this until

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-18 Thread Patrick Wendell
gt;> mode and yarn-cluster mode. >>> >>> >>> On Sat, May 17, 2014 at 10:08 AM, Andrew Or wrote: >>> >>>> +1 >>>> >>>> >>>> 2014-05-17 8:53 GMT-07:00 Mark Hamstra : >>>> >>>>> +1 >&g

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Patrick Wendell
@xiangrui - we don't expect these to be present on the system classpath, because they get dynamically added by Spark (e.g. your application can call sc.addJar well after the JVM's have started). @db - I'm pretty surprised to see that behavior. It's definitely not intended that users need reflectio

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Patrick Wendell
@db - it's possible that you aren't including the jar in the classpath of your driver program (I think this is what mridul was suggesting). It would be helpful to see the stack trace of the CNFE. - Patrick On Sun, May 18, 2014 at 11:54 AM, Patrick Wendell wrote: > @xiangrui - we

[VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has one bug fix and one minor feature on top of rc8: SPARK-1864: https://github.com/apache/spark/pull/808 SPARK-1808: https://github.com/apache/spark/pull/799 The tag to be voted on is v1.0.0-rc9 (commit 920f947):

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Patrick Wendell
I'll start the voting with a +1. On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > This has one bug fix and one minor feature on top of rc8: > SPARK-1864: https://github.com/apac

[RESULT] [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-17 Thread Patrick Wendell
Cancelled in favor of rc9. On Sat, May 17, 2014 at 12:51 AM, Patrick Wendell wrote: > Due to the issue discovered by Michael, this vote is cancelled in favor of > rc9. > > On Fri, May 16, 2014 at 6:22 PM, Michael Armbrust > wrote: >> -1 >> >> We found a regre

Re: [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-17 Thread Patrick Wendell
https://github.com/apache/spark/pull/808 > > Michael > > > On Fri, May 16, 2014 at 3:57 PM, Mark Hamstra > wrote: >> >> +1 >> >> >> On Fri, May 16, 2014 at 2:16 AM, Patrick Wendell >> wrote: >> >> > [Due to ASF e-mail outage, I&#

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
gt;> wrote: >> > It was, but due to the apache infra issues, some may not have received >> the >> > email yet... >> > >> > On Fri, May 16, 2014 at 10:48 AM, Henry Saputra > > >> > wrote: >> >> >> >> Hi Patrick, >>

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Patrick Wendell
uld fix the bug,and should > also test the previous release > -- Original ------ > From: "Patrick Wendell";; > Date: Wed, May 14, 2014 03:02 PM > To: "dev@spark.apache.org"; > > Subject: Re: [VOTE] Release Apache Spark 1.0.0 (r

[VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.0! This patch has minor documentation changes and fixes on top of rc6. The tag to be voted on is v1.0.0-rc7 (commit 9212b3e): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108

[VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-16 Thread Patrick Wendell
[Due to ASF e-mail outage, I'm not if anyone will actually receive this.] Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has only minor changes on top of rc7. The tag to be voted on is v1.0.0-rc8 (commit 80eea0f): https://git-wip-us.apache.org/repos/asf?p=spa

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Patrick Wendell
n Thu, May 15, 2014 at 10:23 AM, Patrick Wendell wrote: > Thanks for your feedback. Since it's not a regression, it won't block > the release. > > On Wed, May 14, 2014 at 12:17 AM, witgo wrote: >> SPARK-1817 will cause users to get incorrect results and RDD.zip is co

[RESULT][VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Patrick Wendell
This vote is cancelled in favor of rc7. On Wed, May 14, 2014 at 1:02 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > > This patch has a few minor fixes on top of rc5. I've also built the > binary artifac

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
I'll start the voting with a +1. On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > > This patch has minor documentation changes and fixes on top of rc6. > > The tag to be voted o

[VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-15 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.0! This patch has a few minor fixes on top of rc5. I've also built the binary artifacts with Hive support enabled so people can test this configuration. When we release 1.0 we might just release both vanilla and Hive-enab

[RESULT] [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-15 Thread Patrick Wendell
This vote is cancelled in favor of rc6. On Wed, May 14, 2014 at 1:04 PM, Patrick Wendell wrote: > I'm cancelling this vote in favor of rc6. > > On Tue, May 13, 2014 at 8:01 AM, Sean Owen wrote: >> On Tue, May 13, 2014 at 2:49 PM, Sean Owen wrote: >>> On Tue, May

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-15 Thread Patrick Wendell
I'm cancelling this vote in favor of rc6. On Tue, May 13, 2014 at 8:01 AM, Sean Owen wrote: > On Tue, May 13, 2014 at 2:49 PM, Sean Owen wrote: >> On Tue, May 13, 2014 at 9:36 AM, Patrick Wendell wrote: >>> The release files, including signatures, digests, etc. ca

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-14 Thread Patrick Wendell
Original ------ > From: "Patrick Wendell";; > Date: Wed, May 14, 2014 04:07 AM > To: "dev@spark.apache.org"; > > Subject: Re: [VOTE] Release Apache Spark 1.0.0 (rc5) > > > > Hey all - there were some earlier RC's that

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-13 Thread Patrick Wendell
Zhu wrote: > just curious, where is rc4 VOTE? > > I searched my gmail but didn't find that? > > > > > On Tue, May 13, 2014 at 9:49 AM, Sean Owen wrote: > >> On Tue, May 13, 2014 at 9:36 AM, Patrick Wendell >> wrote: >> > The release files,

[VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-13 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.0! The tag to be voted on is v1.0.0-rc5 (commit 18f0623): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=18f062303303824139998e8fc8f4158217b0dbc3 The release files, including signatures, digests, etc. can

Re: Updating docs for running on Mesos

2014-05-11 Thread Patrick Wendell
Andrew, Updating these docs would be great! I think this would be a welcome change. In terms of packaging, it would be good to mention the binaries produced by the upstream project as well, in addition to Mesosphere. - Patrick On Thu, May 8, 2014 at 12:51 AM, Andrew Ash wrote: > The docs for h

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Patch here: https://github.com/apache/spark/pull/609 On Wed, Apr 30, 2014 at 2:26 PM, Patrick Wendell wrote: > Dean - our e-mails crossed, but thanks for the tip. Was independently > arriving at your solution :) > > Okay I'll submit something. > > - Patrick > > On

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Dean - our e-mails crossed, but thanks for the tip. Was independently arriving at your solution :) Okay I'll submit something. - Patrick On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin wrote: > Cool, that seems to work. Thanks! > > On Wed, Apr 30, 2014 at 2:09 PM, Patrick

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
ot; export SPARK_MEM=$DRIVER_MEMORY fi -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit "${ORIG_ARGS[@]}" On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell wrote: > So I reproduced the prob

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
I'll dig around a bit more and see if we can fix it. Pretty sure we aren't passing these argument arrays around correctly in bash. On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin wrote: > On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell wrote: >> Yeah I think the problem

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
for more information or --verbose for debugging output > > > On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell wrote: >> I added a fix for this recently and it didn't require adding -J >> notation - are you trying it with this patch? >> >> https://issues.apache

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
I added a fix for this recently and it didn't require adding -J notation - are you trying it with this patch? https://issues.apache.org/jira/browse/SPARK-1654 ./bin/spark-shell --driver-java-options "-Dfoo=a -Dbar=b" scala> sys.props.get("foo") res0: Option[String] = Some(a) scala> sys.props.get

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
override > flag, method call, or method argument actually exist? > > Thanks, > Dean > > > On Tue, Apr 29, 2014 at 1:54 PM, Patrick Wendell wrote: > >> Hi Dean, >> >> We always used the Hadoop libraries here to read and write local >> files. In

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
sHadoopDataset(PairRDDFunctions.scala:749) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:662) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:581) > at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1057) > at spark.activ

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
this pretty tricky. On Tue, Apr 29, 2014 at 11:47 AM, Patrick Wendell wrote: >> What are the expectations / guarantees on binary compatibility between >> 0.9 and 1.0? > > There are not guarantees.

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
> What are the expectations / guarantees on binary compatibility between > 0.9 and 1.0? There are not guarantees.

Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
Hey All, This is not an official vote, but I wanted to cut an RC so that people can test against the Maven artifacts, test building with their configuration, etc. We are still chasing down a few issues and updating docs, etc. If you have issues or bug reports for this release, please send an e-ma

Re: Fw: Is there any way to make a quick test on some pre-commit code?

2014-04-24 Thread Patrick Wendell
This is already on the wiki: https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools On Wed, Apr 23, 2014 at 6:52 PM, Nan Zhu wrote: > I'm just asked by others for the same question > > I think Reynold gave a pretty helpful tip on this, > > Shall we put this on Contribute-to-

Re: all values for a key must fit in memory

2014-04-20 Thread Patrick Wendell
Just wanted to mention - one common thing I've seen users do is use groupByKey, then do something that is commutitive and associative once the values are grouped. Really users here should be doing reduceByKey. rdd.groupByKey().map{ case (key, values) => (key, values.sum)) rdd.reduceByKey(_ + _) I

Re: [jira] [Commented] (SPARK-1496) SparkContext.jarOfClass should return Option instead of a sequence

2014-04-16 Thread Patrick Wendell
oject: Spark > > Issue Type: Improvement > > Components: Spark Core > >Reporter: Patrick Wendell > >Assignee: Patrick Wendell > > Fix For: 1.0.0 > > > > > > This is pretty confusing, especially since addJar expects to take a > single jar. > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) >

Re: It seems that jenkins for PR is not working

2014-04-15 Thread Patrick Wendell
There are a few things going on here wrt tests. 1. I fixed up the RAT issues with a hotfix. 2. The Hive tests were actually disabled for a while accidentally. A recent fix correctly re-enabled them. Without Hive Spark tests run in about 40 minutes and with Hive it runs in 1 hour and 15 minutes, s

Re: org.apache.spark.util.Vector is deprecated what next ?

2014-04-10 Thread Patrick Wendell
You'll need to use the associated functionality in Breeze and then create a dense vector from a Breeze vector. I have a JIRA for us to update the examples for 1.0... I'm hoping Xiangrui can take a look at it. https://issues.apache.org/jira/browse/SPARK-1464 https://github.com/scalanlp/breeze/wik

branch-1.0 cut

2014-04-09 Thread Patrick Wendell
Hey All, In accordance with the scheduled window for the release I've cut a 1.0 branch. Thanks a ton to everyone for being so active in reviews during the last week. In the last 7 days we've merged 66 new patches, and every one of them has undergone thorough peer-review. Tons of committers have be

Re: Flaky streaming tests

2014-04-07 Thread Patrick Wendell
TD - do you know what is going on here? I looked into this ab it and at least a few of these that use Thread.sleep() and assume the sleep will be exact, which is wrong. We should disable all the tests that do and probably they should be re-written to virtualize time. - Patrick On Mon, Apr 7, 20

Re: Master compilation

2014-04-05 Thread Patrick Wendell
If you want to submit a hot fix for this issue specifically please do. I'm not sure why it didn't fail our build... On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das wrote: > I verified this is happening for both CDH4.5 and 1.0.4...My deploy > environment is Java 6...so Java 7 compilation is not goin

Re: Recent heartbeats

2014-04-04 Thread Patrick Wendell
I answered this over on the user list... On Fri, Apr 4, 2014 at 6:13 PM, Debasish Das wrote: > Hi, > > Also posted it on user but then I realized it might be more involved. > > In my ALS runs I am noticing messages that complain about heart beats: > > 14/04/04 20:43:09 WARN BlockManagerMasterAct

Re: Would anyone mind having a quick look at PR#288?

2014-04-02 Thread Patrick Wendell
Hey Evan, Ya thanks this is a pretty small patch. Should definitely be do-able for 1.0. - Patrick On Wed, Apr 2, 2014 at 10:25 AM, Evan Chan wrote: > https://github.com/apache/spark/pull/288 > > It's for fixing SPARK-1154, which would help Spark be a better citizen for > most deploys, and sho

Re: sbt-package-bin

2014-04-01 Thread Patrick Wendell
Ya there is already some fragmentation here. Maven has some "dist" targets and there is also ./make-distribution.sh. On Tue, Apr 1, 2014 at 11:31 AM, Mark Hamstra wrote: > A basic Debian package can already be created from the Maven build: mvn > -Pdeb ... > > > On Tue, Apr 1, 2014 at 11:24 AM, E

Re: sbt-package-bin

2014-04-01 Thread Patrick Wendell
And there is a deb target as well - ah didn't see Mark's email. On Tue, Apr 1, 2014 at 11:36 AM, Patrick Wendell wrote: > Ya there is already some fragmentation here. Maven has some "dist" targets > and there is also ./make-distribution.sh. > > > On Tue, Ap

Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-04-01 Thread Patrick Wendell
14 1:33 PM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > > Yes, lets extend the vote for two more days from now. So the vote is open > till Wednesday, April 02, at 20:00 UTC > > On that note, my +1 > > > TD > > > > > > > On Mon, Mar 31, 2

Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-31 Thread Patrick Wendell
days, it makes it really hard for anyone who is offline for the > weekend to try it out. Either that or extend the voting for more then 3 > days. > > Tom > On Monday, March 31, 2014 12:50 AM, Patrick Wendell > wrote: > > TD - I downloaded and did some local testing. Looks

Re: The difference between driver and master in Spark

2014-03-31 Thread Patrick Wendell
Checkout this page: http://spark.incubator.apache.org/docs/latest/cluster-overview.html On Mon, Mar 31, 2014 at 9:11 AM, Nan Zhu wrote: > master is managing the resources in the cluster, e.g. ensuring all > components can work together, master/worker/driver > > e.g. you have to submit your appl

Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-30 Thread Patrick Wendell
TD - I downloaded and did some local testing. Looks good to me! +1 You should cast your own vote - at that point it's enough to pass. - Patrick On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k wrote: > +1 > tested on Ubuntu12.04 64bit > > > On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia >wrote:

Migration to the new Spark JIRA

2014-03-29 Thread Patrick Wendell
Hey All, We've successfully migrated the Spark JIRA to the Apache infrastructure. This turned out to be a huge effort, lead by Andy Konwinski, who deserves all of our deepest appreciation for managing this complex migration Since Apache runs the same JIRA version as Spark's existing JIRA, there i

Re: Could you undo the JIRA dev list e-mails?

2014-03-29 Thread Patrick Wendell
st dev | grep jira > [hermes] 8:21pm spark.apache.org > > > Note, that I an other moderators will now receive moderation > > emails until the infra ticket is fixed but others will not. > I'll set up a mail filter. > > Chris > > > -Original Message- &g

Re: JIRA. github and asf updates

2014-03-29 Thread Patrick Wendell
e mails :-) > Btw, this is a good problem to have - a vibrant and very actively > engaged community generated a lot of meaningful traffic ! > I just dont want to get distracted from it by repetitions. > > Regards, > Mridul > > > On Sat, Mar 29, 2014 at 11:46 PM, Patrick W

Re: Could you undo the JIRA dev list e-mails?

2014-03-29 Thread Patrick Wendell
Okay I think I managed to revert this by just removing jira@a.o from our dev list. On Sat, Mar 29, 2014 at 11:37 AM, Patrick Wendell wrote: > Hey Chris, > > I don't think our JIRA has been fully migrated to Apache infra, so it's > really confusing to send people e-mail

Could you undo the JIRA dev list e-mails?

2014-03-29 Thread Patrick Wendell
Hey Chris, I don't think our JIRA has been fully migrated to Apache infra, so it's really confusing to send people e-mails referring to the new JIRA since we haven't announced it yet. There is some content there because we've been trying to do the migration, but I'm not sure it's entirely finished

Re: JIRA. github and asf updates

2014-03-29 Thread Patrick Wendell
Ah sorry I see - Jira updates are going to the dev list. Maybe that's not desirable. I think we should send them to the issues@ list. On Sat, Mar 29, 2014 at 11:16 AM, Patrick Wendell wrote: > Mridul, > > You can unsubscribe yourself from any of these sources, right? > > -

Re: JIRA. github and asf updates

2014-03-29 Thread Patrick Wendell
Mridul, You can unsubscribe yourself from any of these sources, right? - Patrick On Sat, Mar 29, 2014 at 11:05 AM, Mridul Muralidharan wrote: > Hi, > > So we are now receiving updates from three sources for each change to > the PR. > While each of them handles a corner case which others migh

Re: Scala 2.10.4

2014-03-28 Thread Patrick Wendell
Really - I didn't know this ever was changed. But in any case, I think you can compile with 2.10.4 and run with 2.10.3 and it's fine - right? On Fri, Mar 28, 2014 at 11:48 AM, Matei Zaharia wrote: > We don't actually use Scala from the user's OS anymore, we use it from the > Spark build, so it's

Re: Mailbomb from amplabs jenkins ?

2014-03-27 Thread Patrick Wendell
Yeah sorry guys - Jenkins is having some issues and there isn't a way to fix this that doesn't spam people following github. Apologies! On Thu, Mar 27, 2014 at 8:16 PM, Nan Zhu wrote: > yes, it sends for every PR you were involved > > I think Patrick is doing something on Jenkins, he just stopp

Re: Spark 0.9.1 release

2014-03-26 Thread Patrick Wendell
Hey TD, This one we just merged into master this morning: https://spark-project.atlassian.net/browse/SPARK-1322 It should definitely go into the 0.9 branch because there was a bug in the semantics of top() which at this point is unreleased in Python. I didn't backport it yet because I figured yo

Re: Travis CI

2014-03-25 Thread Patrick Wendell
ound that the Jenkins is not working from this afternoon > > for one PR, the first time build failed after 90 minutes, the second time it > has run for more than 2 hours, no result is returned > > Best, > > -- > Nan Zhu > > > On Tuesday, March 25, 2014 at 10:06 PM,

Re: Travis CI

2014-03-25 Thread Patrick Wendell
That's not correct - like Michael said the Jenkins build remains the reference build for now. On Tue, Mar 25, 2014 at 7:03 PM, Nan Zhu wrote: > I assume the Jenkins is not working now? > > Best, > > -- > Nan Zhu > > > On Tuesday, March 25, 2014 at 6:42 PM, Michael Armbrust wrote: > > Just a quick

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell
> Spark's dependency graph in a maintenance *Modifying* Spark's dependency graph...

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell
Hey Evan and TD, Spark's dependency graph in a maintenance release seems potentially harmful, especially upgrading a minor version (not just a patch version) like this. This could affect other downstream users. For instance, now without knowing their fastutil dependency gets bumped and they hit so

<    1   2   3   4   5   6   7   >