Re: PMML support in spark

2013-11-07 Thread Matei Zaharia
Hi Pranay, I don’t think anyone’s working on this right now, but contributions would be welcome if this is a thing we could plug into MLlib. Matei On Nov 6, 2013, at 8:44 PM, Pranay Tonpay pranay.ton...@impetus.co.in wrote: Hi, Wanted to know if PMML support in Spark is there in the roadmap

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-07 Thread Patrick Wendell
No problem - thanks for helping us diagnose this! On Tue, Nov 5, 2013 at 5:04 AM, Yadid Ayzenberg ya...@media.mit.edu wrote: Ah, I see. Thanks very much for you assistance Patrick and Reynold. As a workaround for now, I implemented the SC field as transient and its working fine. Yadid On

Spark and geospatial data

2013-11-07 Thread Rob Emanuele
Hello, I'm a developer on the GeoTrellis project (http://geotrellis.github.io). We do fast raster processing over large data sets, from web-time (sub-100ms) processing for live endpoints to distributed raster analysis over clusters using Akka clustering. There's currently discussion underway

Re: Spark and geospatial data

2013-11-07 Thread andy petrella
Hello Rob, As you may know I have a long experience in Geospatial data, and I'm now investigating Spark... So I'll be very interested further answers but also to participate to going forward on this great idea! For instance, I'd say that implementing classical geospatial algorithms like

Re: Spark and geospatial data

2013-11-07 Thread Rob Emanuele
Hi Andy, There would be a large architectural design effort if we decided to support Spark, or replace our current internal actor system with Spark. My thoughts are that the Spark DAG would be fully utilized in tracking lineage and scheduling tasks for the Spark backend, while our current Actor

Where is reduceByKey?

2013-11-07 Thread Philip Ogren
On the front page http://spark.incubator.apache.org/ of the Spark website there is the following simple word count implementation: file = spark.textFile(hdfs://...) file.flatMap(line = line.split( )).map(word = (word, 1)).reduceByKey(_ + _) The same code can be found in the Quick Start

what happens if default parallelism is set to 4x cores?

2013-11-07 Thread Walrus theCat
Will that cause a hit to performance or cause the program to crash? Thanks

Re: Performance drop / unstable in 0.8 release

2013-11-07 Thread Wenlei Xie
Hi, I have all the code for the previous 0.8 version. But how I can find out the SNAPSHOT version there? (in project/SparkBuild.scala it just says version := 0.8.0-SNAPSHOT) Best, Wenlei On Wed, Nov 6, 2013 at 12:09 AM, Reynold Xin r...@apache.org wrote: I don't even think task stealing /

Re: suppressing logging in REPL

2013-11-07 Thread Shay Seng
It seems that I need to have the log4j.properties file in the current directory So if I launch spark-shell in spark/conf I see that INFO is not displayed. On Thu, Nov 7, 2013 at 2:16 PM, Shay Seng s...@1618labs.com wrote: When is the log4j.properties file read... and how can I verify that it

Spark Summit agenda posted

2013-11-07 Thread Matei Zaharia
Hi everyone, We're glad to announce the agenda of the Spark Summit, which will happen on December 2nd and 3rd in San Francisco. We have 5 keynotes and 24 talks lined up, from 18 different companies. Check out the agenda here: http://spark-summit.org/agenda/. This will be the biggest Spark

Re: cluster hangs for no apparent reason

2013-11-07 Thread Shangyu Luo
I am not sure. But in their RDD paper they have mentioned the usage of broadcast variable. Sometimes you may need local variable in many map-reduce jobs and you do not want to copy them to all worker nodes multiple times. Then the broadcast variable is a good choice 2013/11/7 Walrus theCat

Re: Where is reduceByKey?

2013-11-07 Thread Philip Ogren
Thanks - I think this would be a helpful note to add to the docs. I went and read a few things about Scala implicit conversions (I'm obviously new to the language) and it seems like a very powerful language feature and now that I know about them it will certainly be easy to identify when they

Re: Where is reduceByKey?

2013-11-07 Thread Matei Zaharia
Yeah, it’s true that this feature doesn’t provide any way to give good error messages. Maybe some IDEs will support it eventually, though I haven’t seen it. Matei On Nov 7, 2013, at 3:46 PM, Philip Ogren philip.og...@oracle.com wrote: Thanks - I think this would be a helpful note to add to