Thanks, that really helps.
So that helps me cache the spark context within a suite but not across
suites. The closest I could find to caching across suites is extending
Suites [1] and adding @DoNotDiscover annotations to the nested suites
class SparkSuites extends Suites {
new SomeSuite1,
new SomeSuite2,
}
But that means every time I add a new Suite, I have to go add it to
SparkSuites.
Ameet
[1] http://www.artima.com/docs-scalatest-1.7.RC1/org/scalatest/Suites.html
On Wed, Feb 19, 2014 at 1:53 AM, Heiko Braun <[email protected]>wrote:
> Take a look at the trait the spark tests are using:
>
>
> https://github.com/apache/incubator-spark/blob/master/core/src/test/scala/org/apache/spark/SharedSparkContext.scala?source=cc
>
> /Heiko
>
> On 18 Feb 2014, at 22:36, Ameet Kini <[email protected]> wrote:
>
>
> I'm writing unit tests with Spark and need some help.
>
> I've already read this helpful article:
> http://blog.quantifind.com/posts/spark-unit-test/
>
> There are a couple differences in my testing environment versus the blog.
> 1. I'm using FunSpec instead of FunSuite. So my tests look like
>
> class MyTestSpec {
> describe("A suite of tests") {
> it("should do something") {
> // test code
> }
> it("should do something else") {
> // test code
> }
> }
> describe("Another suite of tests") {
> it("should do something") {
> // test code
> }
> it("should do something else") {
> // test code
> }
> }
> }
> 2. I'd like to ideally reuse the SparkContext as much as possible.
> Currently I'm using fixture.FunSpec's withFixture and using the loan
> pattern to loan the SparkContext to the test.
>
> So,
> trait SparkEnvironment extends fixture.FunSpec {
> def withFixture(test: OneArgTest) {
> val sc = SparkUtils.createSparkContext("local", "some name")
>
> try {
> test(sc)
> } finally {
> sc.stop
> System.clearProperty("spark.driver.port")
> }
> }
> }
>
>
> While that works, it ends up creating a spark context per test. I'd like
> to ideally share it across all suites (so, across more than one of my
> TestSpec classes), and less preferably, across multiple suites within a
> MyTestSpec class, and even less preferably, across tests within a suite,
> but don't know how. Right now, each of my "it" tests creates a new spark
> context, and it's really slowing it down.
>
> I tried creating a singleton object and loaning that object to multiple
> tests, but Spark threw an exception saying it can't find some file. I'm
> sure its something I'm (not) doing, as I can't think of a reason why
> SparkContexts can't be shared across tests like that.
>
> object SparkEnvironment {
> var _sc: SparkContext = null
> def sc = {
> if(_sc == null) _sc = SparkUtils.createSparkContext(..)
> _sc
> }
> }
>
> Thanks,
> Ameet
>
>
>