Re: Spark unit testing best practices

2014-05-16 Thread Andras Nemeth
Thanks for the answers! On a concrete example, here is what I did to test my (wrong :) ) hypothesis before writing my email: class SomethingNotSerializable { def process(a: Int): Int = 2 *a } object NonSerializableClosure extends App { val sc = new spark.SparkContext( local,

Re: Spark unit testing best practices

2014-05-16 Thread Nan Zhu
+1, at least with current code just watch the log printed by DAGScheduler… -- Nan Zhu On Wednesday, May 14, 2014 at 1:58 PM, Mark Hamstra wrote: serDe

Spark unit testing best practices

2014-05-15 Thread Andras Nemeth
Hi, Spark's local mode is great to create simple unit tests for our spark logic. The disadvantage however is that certain types of problems are never exposed in local mode because things never need to be put on the wire. E.g. if I accidentally use a closure which has something non-serializable

Re: Spark unit testing best practices

2014-05-14 Thread Andrew Ash
There's an undocumented mode that looks like it simulates a cluster: SparkContext.scala: // Regular expression for simulating a Spark cluster of [N, cores, memory] locally val LOCAL_CLUSTER_REGEX = local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*].r can you running your tests

Re: Spark unit testing best practices

2014-05-14 Thread Philip Ogren
Have you actually found this to be true? I have found Spark local mode to be quite good about blowing up if there is something non-serializable and so my unit tests have been great for detecting this. I have never seen something that worked in local mode that didn't work on the cluster