Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20803#discussion_r175683700
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -166,20 +168,28 @@ private[sql] object Dataset {
class Dataset[T] private[sql](
@transient val sparkSession: SparkSession,
@DeveloperApi @InterfaceStability.Unstable @transient val
queryExecution: QueryExecution,
- encoder: Encoder[T])
+ encoder: Encoder[T],
+ val sqlText: String = "")
--- End diff --
Thanks for your review. I agree this comment. Before the discuss, let me
reproduce the scenario our company met. Team A developed a framework to submit
application with sql sentences in a file
> spark-submit --master yarn-cluster --class com.ebay.SQLFramework -s
biz.sql
In the biz.sql, there are many sql sentences like
> create or replace temporary view view_a select xx from table
${old_db}.table_a where dt=${check_date};
> insert overwrite table ${new_db}.table_a select xx from view_a join
${new_db}.table_b;
> ...
There is no case like
`val df = spark.sql("xxxxx")`
`spark.range(10).collect()`
`df.filter(..).count() `
Team B (Platform) need to capture the really sql sentences which are
executed in whole cluster, as the sql files from Team A contains many
variables. A better way is recording the really sql sentence in EventLog.
Ok, back to the discussion. The original purpose is to display the sql
sentence which user inputs. `spark.range(10).collect()` isn't a sql sentence
user inputs, either `df.filter(..).count() `. Only "xxxxx" is. So I have two
proposals.
1. Change the display behavior, only displays the sql which can trigger
action. like "create table", "insert overwrite", etc. Do not care about the
select sentence. That won't propagate sql text any more. The test case above
won't show anything in SQL ui.
2. Add a SQLCommandEvent and post an event with sql sentence in method
SparkSession.sql(), then in the EventLoggingListener, just logging this to
eventlog.
3. Open another ticket to add a command option `--sqlfile biz.sql` in
spark-submit command. biz.sql must be a file consist by sql sentence. Base this
implementation, not only client mode but also cluster mode can use pure sql.
How do you think? @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]