Hi, I've put together a proof of concept for having DML be a first class citizen in Apache Zeppelin.
Brief intro to Zeppelin - Zeppelin is a "notebook" interface to interact with Spark, Cassandra, Hive and other projects. It can be thought of as a REPL in a browser. Small units of code are put into "cell"s. These individual "cells" can then be run interactively. Of course there is support for queue-ing up and running cells in parallel. Cells are contained in notebooks. Notebooks can be exported and are persistent between sessions. One can type code in (Scala) Spark in cell 1 and save a data frame object. He can then type code in PySpark in cell 2 and access the previously saved data frame. This is done by the Zeppelin runtime system by injecting a special variable called "z" into the Spark and PySpark environments in Zeppelin. This "z" is an object of type ZeppelinContext and makes available a "get" and a "put" method. DML in Spark mode can now access this feature as well. In this POC, DML can operate in 2 modes - standalone and spark. Screenshots of it working: http://imgur.com/a/m7ASx GIF of the screenshots: http://i.imgur.com/NttMuKC.gifv Instructions: https://gist.github.com/anonymous/6ab8c569b2360232e252 JIRA: https://issues.apache.org/jira/browse/SYSTEMML-542 Nakul Jindal