Resubmision due to a fetch failure

2013-11-25 Thread Grega Kešpret
Hi! We use Spark to process logs in batches and persist the end result in a db. Last week, we re-ran the job on the same data couple of times, only to find that one run had more results than the rest. Digging through the logs, we found out that a task has been lost and marked for resubmission. I

Re: Problem with Multi-user In Spark

2013-11-25 Thread prabeesh k
Hi Patrick, Getting following warning while running second user. WARN component.AbstractLifeCycle: FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at

Error when running CassandraTest

2013-11-25 Thread Pulasthi Supun Wickramasinghe
Hi All, I am trying to run the CassandraTest sample but i am running into an error When creating the spark context val sc = new SparkContext(spark://pulasthi-laptop:7077, casDemo); this line causes the following error i don't understand why mesos/Scheduler is needed. I have setup an local

Re: Spark driver behind NAT

2013-11-25 Thread Dmitriy Lyubimov
thank you, guys. much appreciated. -d On Sun, Nov 24, 2013 at 10:12 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Yup, it’s also important to have low latency between the drivers and the workers. If you plan to expose this to the outside (e.g. offer a shell interface), it would be better

Re: local[k] job gets stuck - spark 0.8.0

2013-11-25 Thread Patrick Wendell
When it gets stuck, what does it show in the web UI? Also, can you run a jstack on the process and attach the output... that might explain what's going on. On Mon, Nov 25, 2013 at 11:30 AM, Vijay Gaikwad vijay...@gmail.com wrote: I am using apache spark 0.8.0 to process a large data file and

Re: spark-shell not working on standalone cluster (java.io.IOException: Cannot run program compute-classpath.sh)

2013-11-25 Thread Aaron Davidson
There is a pull request currently to fix this exact issue, I believe, at https://github.com/apache/incubator-spark/pull/192. It's very small and only touches the script files, so you could apply it to your current version and distribute it to the workers. The fix here is that you add an additional

Re: spark-shell not working on standalone cluster (java.io.IOException: Cannot run program compute-classpath.sh)

2013-11-25 Thread Grega Kešpret
Thanks, will try it out! Grega -- [image: Inline image 1] *Grega Kešpret* Analytics engineer Celtra — Rich Media Mobile Advertising celtra.com http://www.celtra.com/ | @celtramobilehttp://www.twitter.com/celtramobile On Mon, Nov 25, 2013 at 11:54 PM, Aaron Davidson ilike...@gmail.com wrote:

Re: How to run compile .jar in spark?

2013-11-25 Thread Michael Kun Yang
thanks On Mon, Nov 25, 2013 at 3:00 PM, Ankur Chauhan achau...@brightcove.comwrote: Have a look at https://github.com/apache/incubator-spark/blob/master/run-example That should help you figure out how to run a jar file using spark (given you have the classpath/dependencies set up). --

Re: Kryo serialization for shuffles

2013-11-25 Thread Mayuresh Kunjir
This shows how to serialize user classes. I wanted Spark to serialize all shuffle files and object files using Kryo. How can I specify that? Or would that be done by default if I just set spark.serializer to kryo? On Mon, Nov 25, 2013 at 7:42 PM, Matei Zaharia matei.zaha...@gmail.comwrote:

Re: Kryo serialization for shuffles

2013-11-25 Thread Andrew Ash
How do you know Spark doesn't also use Kryo for shuffled files? Are there metrics or logs somewhere that make you believe it's normal Java serialization? On Mon, Nov 25, 2013 at 4:46 PM, Mayuresh Kunjir mayuresh.kun...@gmail.comwrote: This shows how to serialize user classes. I wanted Spark

Re: Kryo serialization for shuffles

2013-11-25 Thread Andrew Ash
Hi Matei, I've clarified the documentation to include this information in this pull request. Can you take a look? https://github.com/apache/incubator-spark/pull/206 On Mon, Nov 25, 2013 at 5:03 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Yeah, if you just say spark.serializer to Kryo, it

Re: step-by-step recipe for running spark 0.8 on ec2

2013-11-25 Thread Walrus theCat
Andrew, I don't think so. I run Spark on EC2 all the time. It only started crashing when I upgraded my project to 0.8. The python script gets the latest AMI, which has Spark 0.7.3 installed (not 0.8). There has to be some standard procedure, probably involving git pull and the copy-dir

Re: step-by-step recipe for running spark 0.8 on ec2

2013-11-25 Thread Ashish Rangole
Hi Walrus theCat, We have been successfully using Spark 0.8 on EC2 ever since it was released and we do this several times a day. We use spark-ec2.py with the new version option (--spark-version=0.8.0), to spin-up the Spark 0.8 cluster on ec2. The key is to use the new spark-ec2.py and not the