Ll mlll On Jun 14, 2014 4:05 AM, "Matei Zaharia" <matei.zaha...@gmail.com> wrote:
> You need to factor your program so that it’s not just a main(). This is > not a Spark-specific issue, it’s about how you’d unit test any program in > general. In this case, your main() creates a SparkContext, so you can’t > pass one from outside, and your code has to read data from a file and write > it to a file. It would be better to move your code for transforming data > into a new function: > > def processData(lines: RDD[String]): RDD[String] = { > // build and return your “res” variable > } > > Then you can unit-test this directly on data you create in your program: > > val myLines = sc.parallelize(Seq(“line 1”, “line 2”)) > val result = GetInfo.processData(myLines).collect() > assert(result.toSet === Set(“res 1”, “res 2”)) > > Matei > > On Jun 13, 2014, at 2:42 PM, SK <skrishna...@gmail.com> wrote: > > > Hi, > > > > I have looked through some of the test examples and also the brief > > documentation on unit testing at > > http://spark.apache.org/docs/latest/programming-guide.html#unit-testing, > but > > still dont have a good understanding of writing unit tests using the > Spark > > framework. Previously, I have written unit tests using specs2 framework > and > > have got them to work in Scalding. I tried to use the specs2 framework > with > > Spark, but could not find any simple examples I could follow. I am open > to > > specs2 or Funsuite, whichever works best with Spark. I would like some > > additional guidance, or some simple sample code using specs2 or > Funsuite. My > > code is provided below. > > > > > > I have the following code in src/main/scala/GetInfo.scala. It reads a > Json > > file and extracts some data. It takes the input file (args(0)) and output > > file (args(1)) as arguments. > > > > object GetInfo{ > > > > def main(args: Array[String]) { > > val inp_file = args(0) > > val conf = new SparkConf().setAppName("GetInfo") > > val sc = new SparkContext(conf) > > val res = sc.textFile(log_file) > > .map(line => { parse(line) }) > > .map(json => > > { > > implicit lazy val formats = > > org.json4s.DefaultFormats > > val aid = (json \ "d" \ "TypeID").extract[Int] > > val ts = (json \ "d" \ "TimeStamp").extract[Long] > > val gid = (json \ "d" \ "ID").extract[String] > > (aid, ts, gid) > > } > > ) > > .groupBy(tup => tup._3) > > .sortByKey(true) > > .map(g => (g._1, g._2.map(_._2).max)) > > res.map(tuple=> "%s, %d".format(tuple._1, > > tuple._2)).saveAsTextFile(args(1)) > > } > > > > > > I would like to test the above code. My unit test is in src/test/scala. > The > > code I have so far for the unit test appears below: > > > > import org.apache.spark._ > > import org.specs2.mutable._ > > > > class GetInfoTest extends Specification with java.io.Serializable{ > > > > val data = List ( > > ("d": {"TypeID" = 10, "Timestamp": 1234, "ID": "ID1"}), > > ("d": {"TypeID" = 11, "Timestamp": 5678, "ID": "ID1"}), > > ("d": {"TypeID" = 10, "Timestamp": 1357, "ID": "ID2"}), > > ("d": {"TypeID" = 11, "Timestamp": 2468, "ID": "ID2"}) > > ) > > > > val expected_out = List( > > ("ID1",5678), > > ("ID2",2468), > > ) > > > > "A GetInfo job" should { > > //***** How do I pass "data" define above as input and output > > which GetInfo expects as arguments? ****** > > val sc = new SparkContext("local", "GetInfo") > > > > //*** how do I get the output *** > > > > //assuming out_buffer has the output I want to match it to > the > > expected output > > "match expected output" in { > > ( out_buffer == expected_out) must beTrue > > } > > } > > > > } > > > > I would like some help with the tasks marked with "****" in the unit test > > code above. If specs2 is not the right way to go, I am also open to > > FunSuite. I would like to know how to pass the input while calling my > > program from the unit test and get the output. > > > > Thanks for your help. > > > > > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/guidance-on-simple-unit-testing-with-Spark-tp7604.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >