Re: Workflow manager for Spark and Spark SQL

Ali Tajeldin EDU Thu, 10 Dec 2015 12:57:12 -0800

Hi Alexander,
  We developed SMV to address the exact issue you mentioned. While it is not a 
workflow engine per-se, It does allow for the creation of modules with 
dependency and automates the execution of these modules.  See 
https://github.com/TresAmigosSD/SMV/blob/master/docs/user/smv_intro.md for an 
introduction and https://github.com/TresAmigosSD/SMV for the source.


SmvModules provide the following:
* Grouping of Dataframe operations with dependency.
* Automatic versioning of module results so that subsequent runs do not require 
re-running of entire app.
* Automatic detection of module code changes.
* Grouping of modules into stages and publishing of stage results to allow 
large independent teams to work independently.
* Module level dependency graphs.  The higher level graphs tend to show intent 
of application better than low level Relation Algebra graphs.
--
Ali


On Dec 10, 2015, at 10:50 AM, Alexander Pivovarov <apivova...@gmail.com> wrote:

> Hi Everyone
> 
> I'm curious what people usually use to build ETL workflows based on 
> DataFrames and Spark API?
> 
> In Hadoop/Hive world people usually use Oozie. Is it different in Spark 
> world?

Re: Workflow manager for Spark and Spark SQL

Reply via email to