Thanks for all the suggestion. Very Helpful. On 17 January 2017 at 22:04, Lars Albertsson <la...@mapflat.com> wrote:
> My advice, short version: > * Start by testing one job per test. > * Use Scalatest or a standard framework. > * Generate input datasets with Spark routines, write to local file. > * Run job with local master. > * Read output with Spark routines, validate only the fields you care > about for the test case at hand. > * Focus on building a functional regression test suite with small test > cases before testing with large input datasets. The former improves > productivity more. > > Avoid: > * Test frameworks coupled to your processing technology - they will > make it difficult to switch. > * Spending much effort to small unit tests. Internal interfaces in > Spark tend to be volatile, and testing against them results in high > maintenance costs. > * Input files checked in to version control. They are difficult to > maintain. Generate input files with code instead. > * Expected output files checked in to VC. Same reason. Validate > selected fields instead. > > For a longer answer, please search for my previous posts to the user > list, or watch this presentation: https://vimeo.com/192429554 > > Slides at http://www.slideshare.net/lallea/test-strategies-for- > data-processing-pipelines-67244458 > > > Regards, > > > > Lars Albertsson > Data engineering consultant > www.mapflat.com > https://twitter.com/lalleal > +46 70 7687109 > Calendar: https://goo.gl/6FBtlS, https://freebusy.io/la...@mapflat.com > > > On Sun, Jan 15, 2017 at 7:14 PM, A Shaikh <shaikh.af...@gmail.com> wrote: > > Whats the most popular Testing approach for Spark App. I am looking > > something in the line of TDD. >