There's an undocumented mode that looks like it simulates a cluster: SparkContext.scala: // Regular expression for simulating a Spark cluster of [N, cores, memory] locally val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
can you running your tests with a master URL of "local-cluster[2,2,512]" to see if that does serialization? On Wed, May 14, 2014 at 3:34 AM, Andras Nemeth < andras.nem...@lynxanalytics.com> wrote: > Hi, > > Spark's local mode is great to create simple unit tests for our spark > logic. The disadvantage however is that certain types of problems are never > exposed in local mode because things never need to be put on the wire. > > E.g. if I accidentally use a closure which has something non-serializable > in it, then my test will happily succeed in local mode but go down in > flames on a real cluster. > > Other example is kryo: I'd like to use setRegistrationRequired(true) to > avoid any hidden performance problems due to forgotten registration. And of > course I'd like things to fail in tests. But it won't happen because we > never actually need to serialize the RDDs in local mode. > > So, is there some good solution to the above problems? Is there some > local-like mode which simulates serializations as well? Or is there an easy > way to start up *from code* a standalone spark cluster on the machine > running the unit test? > > Thanks, > Andras > >