Spark Streaming Maven Build

2014-03-04 Thread Bin Wang
Hi there, I tried the Kafka WordCount example and it works perfect and the code is pretty straightforward to understand. Can anyone show to me how to start your own maven project with the KafkaWordCount example using minimum-effort. 1. How the pom file should look like (including jar-plugin? ass

Re: Missing Spark URL after staring the master

2014-03-04 Thread Bin Wang
Hi Mayur, I am using CDH4.6.0p0.26. And the latest Cloudera Spark parcel is Spark 0.9.0 CDH4.6.0p0.50. As I mentioned, somehow, the Cloudera Spark version doesn't contain the run-example shell scripts.. However, it is automatically configured and it is pretty easy to set up across the cluster...

trying to understand job cancellation

2014-03-04 Thread Koert Kuipers
i have a running job that i cancel while keeping the spark context alive. at the time of cancellation the active stage is 14. i see in logs: 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job group 3a25db23-2e39-4497-b7ab-b26b2a976f9c 2014/03/04 16:43:19 INFO scheduler.TaskSched

Re: Actors and sparkcontext actions

2014-03-04 Thread Ognen Duzlevski
Deb, On 3/4/14, 9:02 AM, Debasish Das wrote: Hi Ognen, Any particular reason of choosing scalatra over options like play or spray ? Is scalatra much better in serving apis or is it due to similarity with ruby's sinatra ? Did you try the other options and then pick scalatra ? Not really. I

Re: Missing Spark URL after staring the master

2014-03-04 Thread Mayur Rustagi
I have on cloudera vm http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Cloudera_VM which version are you trying to setup on cloudera.. also which cloudera version are you using... Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

how to add rddID to tuples in DStream

2014-03-04 Thread Adrian Mocanu
Hi I've another simple question: I have a DStream and want to add the ID of each RDD to the tuples contained in that RDD and return a DStream of them. So far I've figured out how to do this via foreachRDD and return RDDs. How to create a DStream from these RDDs? Or even better, how to use .map to

Re: o.a.s.u.Vector instances for equality

2014-03-04 Thread Oleksandr Olgashko
Thanks. Does it make sence to add ==/equals method for Vector with this (or same) behavior? 2014-03-04 6:00 GMT+02:00 Shixiong Zhu : > Vector is an enhanced Array[Double]. You can compare it like > Array[Double]. E.g., > > scala> val v1 = Vector(1.0, 2.0) > v1: org.apache.spark.util.Vector = (1.

Re: sstream.foreachRDD

2014-03-04 Thread Soumya Simanta
I think you need to call collect . > On Mar 4, 2014, at 11:18 AM, Adrian Mocanu wrote: > > Hi > I’ve noticed that if in the driver of a spark app I have a foreach and add > stream elements to a list from the stream, the list contains no elements at > the end of the processing. > > Take this

sstream.foreachRDD

2014-03-04 Thread Adrian Mocanu
Hi I've noticed that if in the driver of a spark app I have a foreach and add stream elements to a list from the stream, the list contains no elements at the end of the processing. Take this sample code: val list= new java.util.List() sstream.foreachRDD (rdd => rdd.foreach( tuple => list.add

Fwd: [Scikit-learn-general] Spark+sklearn sprint outcome ?

2014-03-04 Thread Nick Pentreath
Thought that Spark users may be interested in the outcome of the Spark / scikit-learn sprint that happened last month just after Strata... -- Forwarded message -- From: Olivier Grisel Date: Fri, Feb 21, 2014 at 6:30 PM Subject: Re: [Scikit-learn-general] Spark+sklearn sprint outc

Re: Actors and sparkcontext actions

2014-03-04 Thread Debasish Das
Hi Ognen, Any particular reason of choosing scalatra over options like play or spray ? Is scalatra much better in serving apis or is it due to similarity with ruby's sinatra ? Did you try the other options and then pick scalatra ? Thanks. Deb On Tue, Mar 4, 2014 at 4:50 AM, Ognen Duzlevski

Re: RDD Manipulation in Scala.

2014-03-04 Thread trottdw
Thanks Sean, I think that is doing what I needed. It was much simpler than what I had been attempting. Is it possible to do an OR statement filter? So, that for example column 2 can be filtered by "A2" appearances and column 3 by "A4"? -- View this message in context: http://apache-spark-u

Re: RDD Manipulation in Scala.

2014-03-04 Thread Sean Owen
data.filter(_.split("\t")(1) == "A2") ? -- Sean Owen | Director, Data Science | London On Tue, Mar 4, 2014 at 1:06 PM, trottdw wrote: > Hello, I am using Spark with Scala and I am attempting to understand the > different filtering and mapping capabilities available. I haven't found an > exampl

RDD Manipulation in Scala.

2014-03-04 Thread trottdw
Hello, I am using Spark with Scala and I am attempting to understand the different filtering and mapping capabilities available. I haven't found an example of the specific task I would like to do. I am trying to read in a tab spaced text file and filter specific entries. I would like this filter

Re: Actors and sparkcontext actions

2014-03-04 Thread Ognen Duzlevski
Suraj, I posted to this list a link to my blog where I detail how to do a simple actor/sparkcontext thing with the added obstacle of it being within a Scalatra servlet. Thanks for the code! Ognen On 3/4/14, 3:20 AM, Suraj Satishkumar Sheth wrote: Hi Ognen, See if this helps. I was working on

Problem with Spark on Mesos

2014-03-04 Thread juanpedromoreno
Hi, I'm using vagrant. I've built a cluster with MESOS and SPARK with the following structure: - 2 zookeeper nodes - 2 master nodes - 3 slave nodes In each master node, I've installed MESOS and Spark. My $SPARK_HOME/conf/spark_env.sh contains: export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmes

RE: Problem with "delete spark temp dir" on spark 0.8.1

2014-03-04 Thread Chen Jingci
Hi, I also encounter the same problem when I run locally. But when I run on cluster, everything is fine. Then I run locally again without the jars parameter, the exception disappears. Best regards, Chen jingci --sent from phone, sorry for the typo -Original Message- From: "goi cto" Sen

Re: Problem with "delete spark temp dir" on spark 0.8.1

2014-03-04 Thread goi cto
Exception in thread "delete Spark temp dir C:\Users\..." java.io.IOException: failed to delete: C:\Users\...\simple-project-1.0.jar" at org.apache.spark.util.utils$.deleteRecursively(Utils.scala:495) at org.apache.spark.util.utils$$anonfun$deleteRecursively$1.apply(Utils.scala:491) I deleted my

Re: Problem with "delete spark temp dir" on spark 0.8.1

2014-03-04 Thread Akhil Das
Hi, Try to clean your temp dir, System.getProperty("java.io.tmpdir") Also, Can you paste a longer stacktrace? Thanks Best Regards On Tue, Mar 4, 2014 at 2:55 PM, goi cto wrote: > Hi, > > I am running a spark java program on a local machine. when I try to write > the output to a file (RDD.

Fwd: Problem with "delete spark temp dir" on spark 0.8.1

2014-03-04 Thread goi cto
Hi, I am running a spark java program on a local machine. when I try to write the output to a file (RDD.SaveAsTextFile) I am getting this exception: Exception in thread "Delete Spark temp dir ..." This is running on my local window machine. Any ideas? -- Eran | CTO

RE: Actors and sparkcontext actions

2014-03-04 Thread Suraj Satishkumar Sheth
Hi Ognen, See if this helps. I was working on this : class MyClass[T](sc : SparkContext, flag1 : Boolean, rdd : RDD[T], hdfsPath : String) extends Actor { def act(){ if(flag1) this.process() else this.count } private def process(){ println(sc.textFile(hdfsPath).count) //

RE: Connection Refused When Running SparkPi Locally

2014-03-04 Thread Li, Rui
I've encountered similar problems. Maybe you can try using hostname or FQDN (rather than IP address) of your node for the master URI. In my case, AKKA picks the FQDN for master URI and worker has to use exactly the same string for connection. From: Benny Thompson [mailto:ben.d.tho...@gmail.com]

Re: Shuffle Files

2014-03-04 Thread Aniket Mokashi
>From BlockManager code + ShuffleMapTask code, it writes under spark.local.dir or java.io.tmpdir. val diskBlockManager = new DiskBlockManager(shuffleBlockManager, conf.get("spark.local.dir", System.getProperty("java.io.tmpdir"))) On Mon, Mar 3, 2014 at 10:45 PM, Usman Ghani wrote: > Whe