Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Sean Owen
+1 (non binding) Signatures and license looks good. I built the plain-vanilla distribution and ran tests. While I still see the Java 8 + Hive test failure, I think we've established this is ignorable. On Wed, Nov 19, 2014 at 11:51 PM, Andrew Or and...@databricks.com wrote: I will start with a

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Madhu
Thanks Patrick. I've been testing some 1.2 features, looks good so far. I have some example code that I think will be helpful for certain MR-style use cases (secondary sort). Can I still add that to the 1.2 documentation, or is that frozen at this point? - -- Madhu

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Corey Nolet
I was actually about to post this myself- I have a complex join that could benefit from something like a GroupComparator vs having to do multiple grouyBy operations. This is probably the wrong thread for a full discussion on this but I didn't see a JIRA ticket for this or anything similar- any

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Nan Zhu
BTW, this PR https://github.com/apache/spark/pull/2524 is related to a blocker level bug, and this is actually close to be merged (have been reviewed for several rounds) I would appreciated if anyone can continue the process, @mateiz -- Nan Zhu http://codingcat.me On Thursday, November

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread slcclimber
+1 Built successfully and ran the python examples. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-1-RC2-tp9439p9452.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

[important] jenkins down

2014-11-20 Thread shane knapp
i noticed that there were no builds, and noticed that it's throwing a bunch of exceptions in the log file. i'm looking in to this right now and will update when i get things rolling again. sorry for the inconvenience, shane

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Hector Yee
I'm getting a lot of task lost with this build in a large mesos cluster. Happens with both hash and sort shuffles. 14/11/20 18:08:38 WARN TaskSetManager: Lost task 9.1 in stage 1.0 (TID 897, i-d4d6553a.inst.aws.airbnb.com): FetchFailed(null, shuffleId=1, mapId=-1, reduceId=9, message=

Re: [important] jenkins down

2014-11-20 Thread shane knapp
ok, we're back up and building now... looks like there was a seriously bad git (or github) plugin update that caused all sorts of unintended consequences, mostly with cron stacktracing. i'll take a closer look and see if i can find out exactly what happened, but suffice to say, we'll be really

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Hector Yee
I'm still seeing the fetch failed error and updated https://issues.apache.org/jira/browse/SPARK-3633 On Thu, Nov 20, 2014 at 10:21 AM, Marcelo Vanzin van...@cloudera.com wrote: +1 (non-binding) . ran simple things on spark-shell . ran jobs in yarn client cluster modes, and standalone

Re: Implementing TinkerPop on top of GraphX

2014-11-20 Thread Kushal Datta
I have also added a graphx-gremlin module in the Tinkerpop3 codebase. Right now a GraphX graph can be instantiated from the Gremlin command line (in a similar manner a Giraph graph is instantiated) and the g.V().count() function calls the count() method on RDDs. Please check out the code in:

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Matei Zaharia
You can still send patches for docs until the release goes out -- please do if you see stuff. Matei On Nov 20, 2014, at 6:39 AM, Madhu ma...@madhu.com wrote: Thanks Patrick. I've been testing some 1.2 features, looks good so far. I have some example code that I think will be helpful for

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Nishkam Ravi
Seeing issues with sort-based shuffle (OOM errors and memory leak): https://issues.apache.org/jira/browse/SPARK-4515. Good performance gains for TeraSort as compared to hash (as expected). Thanks, Nishkam On Thu, Nov 20, 2014 at 11:20 AM, Matei Zaharia matei.zaha...@gmail.com wrote: You can

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Hector Yee
I think it is a race condition caused by netty deactivating a channel while it is active. Switched to nio and it works fine --conf spark.shuffle.blockTransferService=nio On Thu, Nov 20, 2014 at 10:44 AM, Hector Yee hector@gmail.com wrote: I'm still seeing the fetch failed error and updated

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Hector Yee
This is whatever was in http://people.apache.org/~andrewor14/spark-1 .1.1-rc2/ On Thu, Nov 20, 2014 at 11:48 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Hector, is this a comment on 1.1.1 or on the 1.2 preview? Matei On Nov 20, 2014, at 11:39 AM, Hector Yee hector@gmail.com wrote:

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Matei Zaharia
Ah, I see. But the spark.shuffle.blockTransferService property doesn't exist in 1.1 (AFAIK) -- what exactly are you doing to get this problem? Matei On Nov 20, 2014, at 11:50 AM, Hector Yee hector@gmail.com wrote: This is whatever was in

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Hector Yee
Whoops I must have used the 1.2 preview and mixed them up. spark-shell -version shows version 1.2.0 Will update the bug https://issues.apache.org/jira/browse/SPARK-4516 to 1.2 On Thu, Nov 20, 2014 at 11:59 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Ah, I see. But the

Spark Streaming Metrics

2014-11-20 Thread Gerard Maas
As the Spark Streaming tuning guide indicates, the key indicators of a healthy streaming job are: - Processing Time - Total Delay The Spark UI page for the Streaming job [1] shows these two indicators but the metrics source for Spark Streaming (StreamingSource.scala) [2] does not. Any reasons

Re: [MLlib] Contributing Algorithm for Outlier Detection

2014-11-20 Thread Joseph Bradley
Could we move discussion of the design and implementation to the JIRA and/or a work-in-progress PR (tagged with [WIP])? That will help leave a record for the future. Thanks! Joseph On Wed, Nov 19, 2014 at 9:59 PM, Ashutosh ashutosh.triv...@iiitb.org wrote: Done. Thanks. Added you as a

Re: [MLlib] Contributing Algorithm for Outlier Detection

2014-11-20 Thread slcclimber
That would be a very wise decision. On Nov 20, 2014 3:53 PM, Joseph Bradley [via Apache Spark Developers List] ml-node+s1001551n9467...@n3.nabble.com wrote: Could we move discussion of the design and implementation to the JIRA and/or a work-in-progress PR (tagged with [WIP])? That will help

Re: Eliminate copy while sending data : any Akka experts here ?

2014-11-20 Thread Reynold Xin
Can you elaborate? Not 100% sure if I understand what you mean. On Thu, Nov 20, 2014 at 7:14 PM, Shixiong Zhu zsxw...@gmail.com wrote: Is it possible that Spark buffers the messages of mapOutputStatuses(Array[Byte]) according to the size of mapOutputStatuses which have already sent but not

sbt publish-local fails, missing spark-network-common

2014-11-20 Thread pedrorodriguez
I am developing an application which calls into Spark MLlib I am working on (LDA). To do so, I am linking Spark locally in the application and using sbt assembly/publish-local in the spark directory.  When I run sbt assembly in my application I get the following error:  $ sbt assembly  [info]

Spark development with IntelliJ

2014-11-20 Thread Patrick Wendell
Hi All, I noticed people sometimes struggle to get Spark set up in IntelliJ. I'd like to maintain comprehensive instructions on our Wiki to make this seamless for future developers. Due to some nuances of our build, getting to the point where you can build + test every module from within the IDE