Hi, Sorry for the very slow reply - I am far behind in my mailing list subscriptions.
You'll find a few slides covering the topic in this presentation: https://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458 Video here: https://vimeo.com/192429554 Regards, Lars Albertsson Data engineering entrepreneur www.scling.com, www.mapflat.com https://twitter.com/lalleal +46 70 7687109 On Tue, Feb 25, 2020 at 7:46 PM Ruijing Li <liruijin...@gmail.com> wrote: > > Just wanted to follow up on this. If anyone has any advice, I’d be interested > in learning more! > > On Thu, Feb 20, 2020 at 6:09 PM Ruijing Li <liruijin...@gmail.com> wrote: >> >> Hi all, >> >> I’m interested in hearing the community’s thoughts on best practices to do >> integration testing for spark sql jobs. We run a lot of our jobs with cloud >> infrastructure and hdfs - this makes debugging a challenge for us, >> especially with problems that don’t occur from just initializing a >> sparksession locally or testing with spark-shell. Ideally, we’d like some >> sort of docker container emulating hdfs and spark cluster mode, that you can >> run locally. >> >> Any test framework, tips, or examples people can share? Thanks! >> -- >> Cheers, >> Ruijing Li > > -- > Cheers, > Ruijing Li --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org