I'm trying to unit test a function that reads in a JSON file, manipulates
the DF and then returns a Scala Map.
The function has signature:
def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)
I've created a bootstrap spec for spark jobs that instantiates the Spark
Context and SQLContext like so:
@transient var sc: SparkContext = _
@transient var sqlContext: SQLContext = _
override def beforeAll = {
System.clearProperty("spark.driver.port")
System.clearProperty("spark.hostPort")
val conf = new SparkConf()
.setMaster(master)
.setAppName(appName)
sc = new SparkContext(conf)
sqlContext = new SQLContext(sc)
}
When I do not include sqlContext, my tests run. Once I add the sqlContext I
get the following errors:
16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
constructed (or threw an exception in its constructor). This may indicate
an error, since only one SparkContext may be running in this JVM (see
SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:81)
16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not
unique!
and finally:
[info] IngestSpec:
[info] Exception encountered when attempting to run a suite with class
name: com.company.package.IngestSpec *** ABORTED ***
[info] akka.actor.InvalidActorNameException: actor name
[ExecutorEndpoint] is not unique!
What do I need to do to get a sqlContext through my tests?
Thanks,
-- Steve