I've switched to using something like SharedSparkContext. Now I frequently see a FileNotFoundException in the tests. It seems to be a race condition as it doesn't always occur. The exception seems to crop up while running the job, so creating the spark context goes through fine. Also, it occurs only in the second test within a suite. The first test always run fine.
14/02/19 11:05:18 INFO TaskSetManager: Serialized task 1.0:0 as 2436 bytes in 1 ms 14/02/19 11:05:18 INFO Executor: Running task ID 1 14/02/19 11:05:18 INFO HttpBroadcast: Started reading broadcast variable 1 14/02/19 11:05:18 ERROR Executor: Exception in task ID 1 java.io.FileNotFoundException: http://192.168.1.5:34426/broadcast_1 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) <snip> Googling the exception, I found this issue with the same symptom (broadcast variable is not found) but in a different context (streams): https://spark-project.atlassian.net/browse/STREAMING-38 Ameet On Wed, Feb 19, 2014 at 10:57 AM, Ameet Kini <[email protected]> wrote: > Thanks, that really helps. > > So that helps me cache the spark context within a suite but not across > suites. The closest I could find to caching across suites is extending > Suites [1] and adding @DoNotDiscover annotations to the nested suites > > class SparkSuites extends Suites { > new SomeSuite1, > new SomeSuite2, > } > > But that means every time I add a new Suite, I have to go add it to > SparkSuites. > > Ameet > > [1] http://www.artima.com/docs-scalatest-1.7.RC1/org/scalatest/Suites.html > > > On Wed, Feb 19, 2014 at 1:53 AM, Heiko Braun <[email protected]>wrote: > >> Take a look at the trait the spark tests are using: >> >> >> https://github.com/apache/incubator-spark/blob/master/core/src/test/scala/org/apache/spark/SharedSparkContext.scala?source=cc >> >> /Heiko >> >> On 18 Feb 2014, at 22:36, Ameet Kini <[email protected]> wrote: >> >> >> I'm writing unit tests with Spark and need some help. >> >> I've already read this helpful article: >> http://blog.quantifind.com/posts/spark-unit-test/ >> >> There are a couple differences in my testing environment versus the blog. >> 1. I'm using FunSpec instead of FunSuite. So my tests look like >> >> class MyTestSpec { >> describe("A suite of tests") { >> it("should do something") { >> // test code >> } >> it("should do something else") { >> // test code >> } >> } >> describe("Another suite of tests") { >> it("should do something") { >> // test code >> } >> it("should do something else") { >> // test code >> } >> } >> } >> 2. I'd like to ideally reuse the SparkContext as much as possible. >> Currently I'm using fixture.FunSpec's withFixture and using the loan >> pattern to loan the SparkContext to the test. >> >> So, >> trait SparkEnvironment extends fixture.FunSpec { >> def withFixture(test: OneArgTest) { >> val sc = SparkUtils.createSparkContext("local", "some name") >> >> try { >> test(sc) >> } finally { >> sc.stop >> System.clearProperty("spark.driver.port") >> } >> } >> } >> >> >> While that works, it ends up creating a spark context per test. I'd like >> to ideally share it across all suites (so, across more than one of my >> TestSpec classes), and less preferably, across multiple suites within a >> MyTestSpec class, and even less preferably, across tests within a suite, >> but don't know how. Right now, each of my "it" tests creates a new spark >> context, and it's really slowing it down. >> >> I tried creating a singleton object and loaning that object to multiple >> tests, but Spark threw an exception saying it can't find some file. I'm >> sure its something I'm (not) doing, as I can't think of a reason why >> SparkContexts can't be shared across tests like that. >> >> object SparkEnvironment { >> var _sc: SparkContext = null >> def sc = { >> if(_sc == null) _sc = SparkUtils.createSparkContext(..) >> _sc >> } >> } >> >> Thanks, >> Ameet >> >> >> >
