Github user LantaoJin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20803#discussion_r175683700
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -166,20 +168,28 @@ private[sql] object Dataset {
     class Dataset[T] private[sql](
         @transient val sparkSession: SparkSession,
         @DeveloperApi @InterfaceStability.Unstable @transient val 
queryExecution: QueryExecution,
    -    encoder: Encoder[T])
    +    encoder: Encoder[T],
    +    val sqlText: String = "")
    --- End diff --
    
    Thanks for your review. I agree this comment. Before the discuss, let me 
reproduce the scenario our company met. Team A developed a framework to submit 
application with sql sentences in a file
    > spark-submit --master yarn-cluster --class com.ebay.SQLFramework -s 
biz.sql
    
    In the biz.sql, there are many sql sentences like
    > create or replace temporary view view_a select xx from table 
${old_db}.table_a where dt=${check_date};
    > insert overwrite table ${new_db}.table_a select xx from view_a join 
${new_db}.table_b;
    > ...
    
    There is no case like 
    `val df = spark.sql("xxxxx")`
    `spark.range(10).collect()`
    `df.filter(..).count() `
    
    Team B (Platform) need to capture the really sql sentences which are 
executed in whole cluster, as the sql files from Team A contains many 
variables. A better way is recording the really sql sentence in EventLog.
    
    Ok, back to the discussion. The original purpose is to display the sql 
sentence which user inputs. `spark.range(10).collect()` isn't a sql sentence 
user inputs, either `df.filter(..).count() `. Only "xxxxx" is. So I have two 
proposals.
    1. Change the display behavior, only displays the sql which can trigger 
action. like "create table", "insert overwrite", etc. Do not care about the 
select sentence. That won't propagate sql text any more. The test case above 
won't show anything in SQL ui.
    2. Add a SQLCommandEvent and post an event with sql sentence in method 
SparkSession.sql(), then in the EventLoggingListener, just logging this to 
eventlog.
    3. Open another ticket to add a command option `--sqlfile biz.sql` in 
spark-submit command. biz.sql must be a file consist by sql sentence. Base this 
implementation, not only client mode but also cluster mode can use pure sql.
    
    How do you think? @cloud-fan  



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to