wangshuo128 commented on a change in pull request #31968:
URL: https://github.com/apache/spark/pull/31968#discussion_r608490932
##########
File path:
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
##########
@@ -64,9 +65,15 @@ private[hive] class SparkSQLDriver(val context: SQLContext =
SparkSQLEnv.sqlCont
new VariableSubstitution().substitute(command)
}
context.sparkContext.setJobDescription(substitutorCommand)
- val execution =
context.sessionState.executePlan(context.sql(command).logicalPlan)
- hiveResponse = SQLExecution.withNewExecutionId(execution) {
Review comment:
When run `sql("show tables").collect`, there are 2 same SQL queries in
SQL UI: execute `ShowTablesCommand `. This is different from running "show
tables" in spark-sql.
The reason is that:
1. run "show tables" in spark-sql
we create a DataFrame from the "show tables" SQL, its `logicalPlan` is a
`LocalRelation`. Then create a `QueryExecution` from the `LocalRelation` and
convert it to Hive result string.
see`SaprkSQLDriver.run`
https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala#L67-L70
the query plan is to collect from `LocalRelation`
2. run `sql("show tables").collect`
When call collect, the `Dataset` directly use its own `QueryExecution` , see
https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2983
the query plan is just to execute the `ShowTablesCommand`
So it seems that we can't solve the problem by checking the
`LocalRelation.fromCommand` in `SQLExecution. withNewExecutionId `
I'm thinking another way:
1. unify the behavior in `SaprkSQLDriver.run` and `Dataset.collect`
don't create extra `QueryExecution` in `SaprkSQLDriver.run` (what's the
original purpose to create a new `QueryExecution`?)
```
val execution = context.sql(command).queryExecution
```
2. add a `isCommand` flag in `Dataset`
3. pass the `Dataset` to `SQLExecution. withNewExecutionId` and check
`Dataset.isCommand`
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]