There's an undocumented mode that looks like it simulates a cluster:

SparkContext.scala:
    // Regular expression for simulating a Spark cluster of [N, cores,
memory] locally
    val LOCAL_CLUSTER_REGEX =
"""local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r

can you running your tests with a master URL of "local-cluster[2,2,512]" to
see if that does serialization?


On Wed, May 14, 2014 at 3:34 AM, Andras Nemeth <
andras.nem...@lynxanalytics.com> wrote:

> Hi,
>
> Spark's local mode is great to create simple unit tests for our spark
> logic. The disadvantage however is that certain types of problems are never
> exposed in local mode because things never need to be put on the wire.
>
> E.g. if I accidentally use a closure which has something non-serializable
> in it, then my test will happily succeed in local mode but go down in
> flames on a real cluster.
>
> Other example is kryo: I'd like to use setRegistrationRequired(true) to
> avoid any hidden performance problems due to forgotten registration. And of
> course I'd like things to fail in tests. But it won't happen because we
> never actually need to serialize the RDDs in local mode.
>
> So, is there some good solution to the above problems? Is there some
> local-like mode which simulates serializations as well? Or is there an easy
> way to start up *from code* a standalone spark cluster on the machine
> running the unit test?
>
> Thanks,
> Andras
>
>

Reply via email to