Re: spark pre-built binaries for 0.8.0

2013-12-17 Thread phoenix bai
yeah I know. I can`t do nothing to improve my network, so, all i manage to do is: if it looks hanging, i kill it and restart. so far so good, but looks long way to go. On Wed, Dec 18, 2013 at 3:35 PM, Azuryy Yu wrote: > Hi Phoenix, > This is not Spark releated. It was your local net work li

Re: spark pre-built binaries for 0.8.0

2013-12-17 Thread Azuryy Yu
Hi Phoenix, This is not Spark releated. It was your local net work limited. Thanks. On Wed, Dec 18, 2013 at 3:17 PM, phoenix bai wrote: > I am compiling against hadoop 2.2.0, it really takes time, especially with > network connection is not that stable and all. > > SPARK_HADOOP_VERSION=2.2.0 S

Re: spark pre-built binaries for 0.8.0

2013-12-17 Thread phoenix bai
I am compiling against hadoop 2.2.0, it really takes time, especially with network connection is not that stable and all. SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true ./sbt/sbt assembly On Wed, Dec 18, 2013 at 10:39 AM, Patrick Wendell wrote: > Hey Philip, > > No - those are compiled against the

Re: FileNotFoundException running spark job

2013-12-17 Thread Nathan Kronenfeld
On Tue, Dec 17, 2013 at 11:05 PM, Azuryy Yu wrote: > I think you need to increase ulimit to avoid 'too many open files' error, > then FileNotFoundException should disappear. > That was our initial thought too... but this is happening on even trivial jobs that worked fine a few days ago. And I'm

Re: FileNotFoundException running spark job

2013-12-17 Thread Azuryy Yu
I think you need to increase ulimit to avoid 'too many open files' error, then FileNotFoundException should disappear. On Wed, Dec 18, 2013 at 11:56 AM, Nathan Kronenfeld < nkronenf...@oculusinfo.com> wrote: > Hi, Folks. > > I was wondering if anyone has encountered the following error before; I

Spark streaming vs. spark usage

2013-12-17 Thread Nathan Kronenfeld
Hi, Folks. We've just started looking at Spark Streaming, and I find myself a little confused. As I understood it, one of the main points of the system was that one could use the same code when streaming, doing batch processing, or whatnot. Yet when we try to apply a batch processor that analyze

FileNotFoundException running spark job

2013-12-17 Thread Nathan Kronenfeld
Hi, Folks. I was wondering if anyone has encountered the following error before; I've been staring at this all day and can't figure out what it means. In my client log, I get: [INFO] 17 Dec 2013 22:31:09 - org.apache.spark.Logging$class - Lost TID 282 (task 3.0:63) [INFO] 17 Dec 2013 22:31:09 - o

Problem when trying to modify data generated with collect() method from RDD

2013-12-17 Thread 杨强
Hi, everyone. I'm using scala to implement a connected component algorithm in Spark. And the question codes are as follows: 1type Graph = ListBuffer[Array[String]] 2type CCS = ListBuffer[Graph] 3val ccs_array:Array[CCS] = graphs_rdd.map{ graph => find_cc(graph)}.collect() 4var

Re: spark pre-built binaries for 0.8.0

2013-12-17 Thread Patrick Wendell
Hey Philip, No - those are compiled against the "mr1" version. You'll need to build yourself for YARN. - Patrick On Tue, Dec 17, 2013 at 10:32 AM, Philip Ogren wrote: > I have a question about the pre-built binary for 0.8.0 for CDH 4 listed > here: > > http://spark.incubator.apache.org/download

Re: Repartitioning an RDD

2013-12-17 Thread Patrick Wendell
Master and 0.8.1 (soon to be released) have `repartition`. It's actually a new feature not an old one! On Tue, Dec 17, 2013 at 4:31 PM, Mark Hamstra wrote: > https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L280 > > > On Tue, Dec 17, 2013 at

Re: Repartitioning an RDD

2013-12-17 Thread Mark Hamstra
https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L280 On Tue, Dec 17, 2013 at 4:26 PM, Matei Zaharia wrote: > I’m not sure if a method called repartition() ever existed in an official > release, since we don’t remove methods, but there is a

Re: Repartitioning an RDD

2013-12-17 Thread Matei Zaharia
I’m not sure if a method called repartition() ever existed in an official release, since we don’t remove methods, but there is a method called coalesce() that does what you want. You just tell it the desired new number of partitions. You can also have it shuffle the data across the cluster to re

Repartitioning an RDD

2013-12-17 Thread Mahdi Namazifar
Hi everyone, I have a question regarding appending two RDDs using the union function, and I would appreciate if anyone could help me with it. I have two RDDs (let's call them RDD_1 and RDD_2) with the same number of partitions (let's say 10) and they are defined based on the rows of the same set

spark pre-built binaries for 0.8.0

2013-12-17 Thread Philip Ogren
I have a question about the pre-built binary for 0.8.0 for CDH 4 listed here: http://spark.incubator.apache.org/downloads.html and linked here: http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz Should I expect that this release is compiled with YARN enabled (i.e. the SPAR

Bagel message processing vs. group-by operational efficiency

2013-12-17 Thread Dmitriy Lyubimov
Hello, i have a quick question: It just recently occurred to me thtat in Spark group-by is not shuffle-and-sort but rather "shuffle-and-hash", i.e. there's no sorting phase. Right? In that light, a single bagel iteration should really cost just as much as message grouping with the regular "group

Re: spark through vpn, SPARK_LOCAL_IP

2013-12-17 Thread viren kumar
Is that really the only solution? I too am faced with the same problem of running the driver on a machine with two IPs, one internal and one external. I launch the job and the Spark server fails to connect to the client since it tries on the internal IP. I tried setting SPARK_LOCAL_IP, but to no av

Re: Task not running in standalone cluster

2013-12-17 Thread Andrew Ash
Glad you got it figured out! On Tue, Dec 17, 2013 at 8:43 AM, Jie Deng wrote: > don't bother...My problem is using spark-0.9 instead 0.8...because 0.9 > fixed bug which can run from eclipse. > > > 2013/12/17 Jie Deng > >> When I start a task on master, I can see there is a >> CoarseGralinedExc

Re: Task not running in standalone cluster

2013-12-17 Thread Jie Deng
don't bother...My problem is using spark-0.9 instead 0.8...because 0.9 fixed bug which can run from eclipse. 2013/12/17 Jie Deng > When I start a task on master, I can see there is a > CoarseGralinedExcutorBackend java process running on worker, is that saying > something? > > > 2013/12/17 Jie

Re: Re: OOM, help

2013-12-17 Thread Jie Deng
eh..it's hard to say why 9g is not enough, but your file is 7g, and object each string in that file must need more memory; I think you can somehow try to using hdfs store processing data, instead of putting everything in the memory. 2013/12/17 leosand...@gmail.com > HI, > I have set m

Re: Task not running in standalone cluster

2013-12-17 Thread Jie Deng
When I start a task on master, I can see there is a CoarseGralinedExcutorBackend java process running on worker, is that saying something? 2013/12/17 Jie Deng > Hi Andrew, > > Thanks for helping! > Sorry I did not make my self clear, here is the output from iptables (both > master and worker):

Re: Re: OOM, help

2013-12-17 Thread leosand...@gmail.com
HI, I have set my config with : export SPARK_WORKER_MEMORY=1024m export SPARK_DAEMON_JAVA_OPTS=9000m Why the memory is still not enough ? Thanks leosand...@gmail.com From: Jie Deng Date: 2013-12-17 19:44 To: user Subject: Re: OOM, help Hi,Leo, I think java.lang.OutOfMemoryError: Java heap

Re: OOM, help

2013-12-17 Thread Jie Deng
Hi,Leo, I think java.lang.OutOfMemoryError: Java heap space is caused by java memory problem, no connection with spark. Just try -Xmx: more memory when start jvm 2013/12/17 leosand...@gmail.com > hello everyone, > I have a problem when I run the wordcount example. I read data from hdfs , > i

Re: Task not running in standalone cluster

2013-12-17 Thread Jie Deng
Hi Andrew, Thanks for helping! Sorry I did not make my self clear, here is the output from iptables (both master and worker): jie@jie-OptiPlex-7010:~/spark$ sudo ufw status Status: inactive jie@jie-OptiPlex-7010:~/spark$ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source