Thats a good point - there is an open issue for spark-testing-base to support this shared sparksession approach - but I haven't had the time ( https://github.com/holdenk/spark-testing-base/issues/123 ). I'll try and include this in the next release :)
On Mon, Aug 1, 2016 at 9:22 AM, Koert Kuipers <ko...@tresata.com> wrote: > we share a single single sparksession across tests, and they can run in > parallel. is pretty fast > > On Mon, Aug 1, 2016 at 12:02 PM, Everett Anderson < > ever...@nuna.com.invalid> wrote: > >> Hi, >> >> Right now, if any code uses DataFrame/Dataset, I need a test setup that >> brings up a local master as in this article >> <http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/> >> . >> >> That's a lot of overhead for unit testing and the tests can't run in >> parallel, so testing is slow -- this is more like what I'd call an >> integration test. >> >> Do people have any tricks to get around this? Maybe using spy mocks on >> fake DataFrame/Datasets? >> >> Anyone know if there are plans to make more traditional unit testing >> possible with Spark SQL, perhaps with a stripped down in-memory >> implementation? (I admit this does seem quite hard since there's so much >> functionality in these classes!) >> >> Thanks! >> >> - Everett >> >> > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau