Re: Spark hangs on bad Mesos slave

2013-12-11 Thread Gary Malouf
As an addendum, I see a large number of the following in the mesos slave info logs: W1211 05:44:37.057456 14205 monitor.cpp:186] Failed to collect resource usage for executor '201312061449-1315739402-5050-23513-0' of framework '201312061449-1315739402-5050-23513-0026': Future discarded W1211

Re: Fwd: Spark forum question

2013-12-11 Thread Philip Ogren
You might try a more standard windows path. I typically write to a local directory such as target/spark-output. On 12/11/2013 10:45 AM, Nathan Kronenfeld wrote: We are trying to test out running Spark 0.8.0 on a Windows box, and while we can get it to run all the examples that don't output

Re: Fwd: Spark forum question

2013-12-11 Thread Nathan Kronenfeld
Oops. Stupid mail client. Sorry about that When we change res.saveAsTextFile(file:///c:/some/path) to res.saveAsTextFile(path) and run it from c:\some, we get exactly the same error. -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto,

Re: Spark Vs R (Univariate Kernel Density Estimation)

2013-12-11 Thread Imran Rashid
these are just thoughts off the top of my head: 1) if the original R code runs in 3 secs, you are unlikely to be able to improve that drastically with spark. Yes, spark can run sub-second jobs, but no matter what, don't expect spark to get you into the 10 millisecond range. While spark has

Retry instead of die on workers connect failure

2013-12-11 Thread Andrew Ash
Hi Spark users, I'm observing behavior where if a master node goes down for a restart, all the worker JVMs die (in standalone cluster mode). In other cluster computing setups with master-worker relationships (namely Hadoop), if a worker can't connect to the master or its connection drops it

Re: Spark Vs R (Univariate Kernel Density Estimation)

2013-12-11 Thread Evan R. Sparks
Agreed with Imran - without knowing the size/shape of the objects in your program, it's tough to tell where the bottleneck is. Additionally, unless the problem is really big, (in terms of your w and x vectors), it's unlikely that you're going to be CPU bound on the cluster - communication and

Re: load a customized hdfs implementation

2013-12-11 Thread Hossein
Hi The sbt config file is project/SparkBuild.scala There seems to be some sbt performance issues with assembly. You can probably speed it up by calling sbt/sbt assembly/assembly --Hossein On Wed, Dec 11, 2013 at 2:28 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all maybe it's a stupid

How to set executor memory for each worker?

2013-12-11 Thread Jyun-Fan Tsai
We have a cluster and the workers have different memory. The problem we faced is that we first use spark 0.8 ec2 script to create 1 master and some slaves using m1.large instances. Each worker has 7.5G memory and spark use about 6G memory. Everything looks good. However, when we manually added

Re: reading LZO compressed file in spark

2013-12-11 Thread Rajeev Srivastava
Hi Stephen, I tried the same lzo file with a simple hadoop script this seems to work fine HADOOP_HOME=/usr/lib/hadoop /usr/bin/hadoop jar /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop-mapreduce/hadoop-streaming.jar \ -libjars

Sorting data from sequence files is overly memory intensive

2013-12-11 Thread Matt Cheah
Hi everyone Really hoping to get some more help on an issue I've been stuck on for a couple of days now. Basically, building the data manually from a text file and converting the text to the objects I'm sorting on, doesn't behave the same way as when I import the objects directly from a

spark avro: caching leads to identical records?

2013-12-11 Thread Robert Fink
Hi, I have a file containing avro GenericRecords; for debug purposes, let' read one particular field, date_time and print it to the screen: def sc = new SparkContext(local, My Spark Context) val job = new org.apache.hadoop.mapreduce.Job // input data: def avrofile =

Scala driver, Python workers?

2013-12-11 Thread Patrick Grinaway
Hi all, I've been mostly using Spark with Python, and it's been a great time (thanks for the earlier help with GPUs, btw), but I recently stumbled through the Scala API and found it incredibly rich, with some options that would be pretty helpful for us but are lacking in the Python API. Is it

Re: Re: I need some help

2013-12-11 Thread leosand...@gmail.com
What will be cleaned if I compile Spark with sbt/sbt clean assembly? Actually I find there is a problem in my product's url sparkhome/assembly/target/scala-2.9.3 , there are two jars named spark-assembly-0.8.0-incubating-hadoop2.0.0-cdh4.2.1.jar and

Re: RE: I need some help

2013-12-11 Thread leosand...@gmail.com
NO, I build with sbt leosand...@gmail.com From: Liu, Raymond Date: 2013-12-12 14:12 To: user@spark.incubator.apache.org Subject: RE: Re: I need some help The latter one sound to me like been built by mvn? Best Regards, Raymond Liu From: leosand...@gmail.com [mailto:leosand...@gmail.com]