I do not expect the relationship between DAGs to be described in Zeppelin -
that would be done in Airflow.  It just seems that Zeppelin is such a great
tool for a data scientists workflow that it would be nice if once they are
done with the work the note could be productionized directly.  I could
envision a couple of scenarios:

1. Using a zeppelin instance to run the note via the REST API.  The
instance could be containerized and spun up specifically for a DAG or it
could be a permanently available one.
2. A note could be pulled from git and some part of the Zeppelin engine
could execute the note without the web UI at all.

I would expect on the airflow side there to be some special operators for
executing these.

If the scheduler is pluggable then it should be possible to create a plug
in that talks to the Airflow REST API.

I happen to prefer Zeppelin to Jupyter - although I get your point about
both being python.  I don't really view that as a problem - most of the big
data platforms I'm talking to are implemented on the JVM after all.  The
python part of Airflow is really just describing what gets run and it isn't
hard to run something that isn't written in python.

On Fri, May 19, 2017 at 2:52 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
wrote:

> We also use both Zeppelin and Airflow.
>
> I'm interested in hearing what others are doing here too.
>
> Although honestly there might be some challenges
> - Airflow expects a DAG structure, while a notebook has pretty linear
> structure;
> - Airflow is Python-based; Zeppelin is all Java (REST API might be of
> help?).
> Jupyter+Airflow might be a more natural fit to integrate?
>
> On top of that, the way we use Zeppelin is a lot of ad-hoc queries,
> while Airflow is for more finalized workflows I guess?
>
> Thanks for bringing this up.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Fri, May 19, 2017 at 2:20 PM, Ben Vogan <b...@shopkick.com> wrote:
>
>> Hi all,
>>
>> We are really enjoying the workflow of interacting with our data via
>> Zeppelin, but are not sold on using the built in cron scheduling
>> capability.  We would like to be able to create more complex DAGs that are
>> better suited for something like Airflow.  I was curious as to whether
>> anyone has done an integration of Zeppelin with Airflow.
>>
>> Either directly from within Zeppelin, or from the Airflow side.
>>
>> Thanks,
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> <http://www.shopkick.com/>
>> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>

Reply via email to