Hi Alexander, We developed SMV to address the exact issue you mentioned. While it is not a workflow engine per-se, It does allow for the creation of modules with dependency and automates the execution of these modules. See https://github.com/TresAmigosSD/SMV/blob/master/docs/user/smv_intro.md for an introduction and https://github.com/TresAmigosSD/SMV for the source.
SmvModules provide the following: * Grouping of Dataframe operations with dependency. * Automatic versioning of module results so that subsequent runs do not require re-running of entire app. * Automatic detection of module code changes. * Grouping of modules into stages and publishing of stage results to allow large independent teams to work independently. * Module level dependency graphs. The higher level graphs tend to show intent of application better than low level Relation Algebra graphs. -- Ali On Dec 10, 2015, at 10:50 AM, Alexander Pivovarov <apivova...@gmail.com> wrote: > Hi Everyone > > I'm curious what people usually use to build ETL workflows based on > DataFrames and Spark API? > > In Hadoop/Hive world people usually use Oozie. Is it different in Spark > world?