I think that the regular 'local' mode will work for testing serialization; it serializes both tasks and results in order to catch serialization errors:
https://github.com/apache/incubator-spark/blob/v0.8.0-incubating/core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala#L187 https://github.com/apache/incubator-spark/blob/v0.8.0-incubating/core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala#L200 On Mon, Oct 21, 2013 at 1:06 PM, Aaron Davidson <[email protected]> wrote: > To answer your second question first, you can use the SparkContext format > "local-cluster[2, 1, 512]" (instead of "local[2]"), which would create a > local test cluster with 2 workers, each with 1 core and 512 MB of memory. > This should allow you to accurately test things like serialization. > > I don't believe that adding a function-local variable would cause the > function to be unserializable, though. The only concern when shipping > around functions is when they refer to variables *outside* the function's > scope, in which case Spark will automatically ship those variables to all > workers (unless you override this behavior with a broadcast or > accumulator > variable<http://spark.incubator.apache.org/docs/0.7.3/scala-programming-guide.html#shared-variables> > ). > > > On Mon, Oct 21, 2013 at 10:30 AM, Shay Seng <[email protected]> wrote: > >> I'm trying to write a unit test to ensure that some functions I rely on >> will always serialize and run correctly on a cluster. >> In one of these functions I've deliberately added a "val x:Int = 1" which >> should prevent this method from being serializable right? >> >> In the test I've done: >> sc = new SparkContext("local[2]","test") >> ... >> val pdata = sc.parallelize(data) >> val c = pdata.map().collect() >> >> The unit tests still complete with no errors; I'm guessing because spark >> knows that local[2] doesn't require serialization? Is there someway I can >> force spark to run like it would do on a real cluster? >> >> >> tks >> shay >> > >
