GitHub user LantaoJin opened a pull request:

    https://github.com/apache/spark/pull/20876

    [SPARK-23653][SQL] Capture sql statements user input and show them in…

    … SQL UI
    
    ## What changes were proposed in this pull request?
    
    [SPARK-4871](https://issues.apache.org/jira/browse/SPARK-4871) had already 
added the sql statement in job description for using spark-sql. But it has some 
problems:
    
    1. long sql statement cannot be displayed in description column. 
    ![screen shot 2018-03-12 at 14 25 
51](https://user-images.githubusercontent.com/1853780/37287438-c833385e-263f-11e8-86ea-0f8ebb9b151e.png)
    
    
    2. variables like `${var}`, `${env:var}` in sql cannot be resolved.
    spark-sql --hiveconf a=avalue --hivevar b=bvalue
    `spark-sql> select '${a}', '${b}';`
    
    ![screen shot 2018-03-22 at 14 53 
03](https://user-images.githubusercontent.com/1853780/37755515-c78b7da0-2de0-11e8-807d-7e8ed3b859eb.png)
    
    
    3. sql statement submitted in spark-shell or spark-submit cannot be covered.
    
    In eBay, most spark sql applications like ETL, reporting using spark-submit 
to schedule their jobs with a few sql files. The sql statement in those 
applications cannot be saw in current spark UI. 
    
    ![screen shot 2018-03-12 at 20 16 
23](https://user-images.githubusercontent.com/1853780/37287410-bde5166a-263f-11e8-8435-8db29a2eef33.png)
    
    More detail a scenario is team A developed a framework to submit 
application with sql sentences in a file
    > spark-submit --master yarn-cluster --class com.ebay.SQLFramework -s 
biz.sql
    
    In the biz.sql, there are many sql sentences like
    > create or replace temporary view view_a select xx from table 
${old_db}.table_a where dt=${check_date};
    > insert overwrite table ${new_db}.table_a select xx from view_a join 
${new_db}.table_b;
    > ...
    
    Team B (Platform) need to capture the really sql sentences which are 
executed in whole cluster, as the sql files from Team A contains many 
variables. A better way is recording the really sql sentence in EventLog.
    
    ## How was this patch tested?
    ![screen shot 2018-03-21 at 23 22 
07](https://user-images.githubusercontent.com/1853780/37718931-ceb341c6-2d5e-11e8-8f41-4f53a7d83d99.png)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/LantaoJin/spark SPARK-23653-NEW

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20876.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20876
    
----
commit b147b27d6241f1654e7770a5518f66a61d8430e1
Author: LantaoJin <jinlantao@...>
Date:   2018-03-22T06:47:57Z

    [SPARK-23653][SQL] Capture sql statements user input and show them in SQL UI

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to