Jennifer Jones created ZEPPELIN-1448:
----------------------------------------

             Summary: Add CPU time to paragraph output for %pyspark, %spark, 
%python, and %sql interpreters
                 Key: ZEPPELIN-1448
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1448
             Project: Zeppelin
          Issue Type: New Feature
          Components: Interpreters, pySpark, python-interpreter, 
zeppelin-interpreter, zeppelin-server, zeppelin-zengine
    Affects Versions: 0.6.1
         Environment: Mac OS X El Capitan Version 10.11.6, Spark 2.0.0, 
Zeppelin 0.6.1, and the Anaconda distribution of Python 3.5.2.
            Reporter: Jennifer Jones
             Fix For: 0.6.2, 0.7.0, 0.6.1


In Zeppelin, when using the PySpark interpreter (%pyspark) in a cell or 
"paragraph," I want the output to list the CPU time in addition to the 
evaluated output and the actual (physical) time elapsed.

A specific example of a statement I want to time (not necessarily limited to 
sql queries) is something like this:

%pyspark
...
sqlctx = SQLContext(sc)
...
sqlctx.sql("SELECT feature1, feature2, feature3 FROM tableName " + 
           "WHERE feature3 = 'a' LIMIT 100").show()

or a sql count over the total number of rows in the table.

If I use  Zeppelin with the Hive interpreter in a paragraph (%hive), the output 
automatically includes actual time, CPU time, and the evaluated output.

Similarly, if I use IPython (either in a shell or in a Jupyter notebook), I can 
preface a statement with %time to have the output returned along with the CPU 
time.

Please add the capability to return CPU time in a (PySpark, etc.) paragraph in 
a Zeppelin notebook. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to