I suspect the problem might have to do with the serialization/deserialization of GenData. I'd try getting rid of the "extends App" and just writing a main and putting your code in there.
On Fri, Nov 1, 2013 at 3:24 PM, Mohit Jaggi <[email protected]> wrote: > Hi, > I wrote a small spark application to generate some random data. It works > fine if I use "local[n]" but when I use "mesos://..." the vals of outer > object that I am using in my function which is passed to RDD.foreach are > being set to zero. > > import java.io._ > > import math.rint > > import org.apache.spark.SparkContext > > import org.apache.spark.SparkContext._ > > object DataGen extends App { > > val nClusters = 10 > > val nCols = 10000 > > val nRows = 10000 > > val rgen = new util.Random > > System.setProperty("spark.executor.uri", > "hdfs://1b/spark/spark-0.8.0-incubating.tar.gz") > > System.setProperty("spark.mesos.coarse", "true") > > val sc = new SparkContext("mesos://10.0.1.128:5050", "Data Generator", > > "/home/yuzr/spark/spark-0.8.0-incubating", > > List("/home/yuzr/datagen/DataGen-assembly-0.1.jar")) > > > val clusters = sc.parallelize(1 to nClusters) > > val nRowsInCluster = nRows/nClusters > > *println (**"nRowsInCluster=" + nRowsInCluster) //---> prints 1000 in > spark driver* > > * clusters foreach { x => writePart(x, nRowsInCluster) }* > > * //clusters foreach writePart --> had this originally* > > def writePart(nCluster: Int, nRowsInCluster: Int): Unit = { > > val partFile = "/tmp/y" + nCluster + ".txt" > > val partWriter = new java.io.PrintWriter(partFile) > > ... > > * println("Cluster #" + nCluster) --> prints 1 to 10* > > * println ("nRowsInCluster=" + nRowsInCluster) --> prints 0 ??* > > ... > > } > > > partWriter.close > > } > > } > > > What am I doing wrong? > > Mohit. >
