I am interested in playing around with modeling Airflow DAGs along with
their runs using a dynamic graph neural network. Why? It is an example, I
just think it would be cool to inspect the DAGs, then choose a task like
predicting runtimes and train a node embedding as part of a blog post. It
would be great if the network was trained on a real-world workload, that
way it could actually be useful as a starting point for people to do ML
around orchestration with Airflow.

To do this, I need a paired repo / log dataset that includes both some
Airflow DAGs and their associated run-logs.

Does anyone know of an open source of this information? Is this something I
could easily generate by executing the examples or unit tests?

Thanks,
Russell Jurney @rjurney <http://twitter.com/rjurney>
[email protected] LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com

Reply via email to