Re: [PR] [SPARK-48344][SQL] SQL API change to support execution of compound statements [spark]

via GitHub Wed, 31 Jul 2024 07:20:35 -0700


davidm-db commented on code in PR #47403:
URL: https://github.com/apache/spark/pull/47403#discussion_r1698602295



##########
sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -650,14 +657,27 @@ class SparkSession private(
   private[sql] def sql(sqlText: String, args: Array[_], tracker: 
QueryPlanningTracker): DataFrame =
     withActive {
       val plan = tracker.measurePhase(QueryPlanningTracker.PARSING) {
-        val parsedPlan = sessionState.sqlParser.parsePlan(sqlText)
-        if (args.nonEmpty) {
-          PosParameterizedQuery(parsedPlan, 
args.map(lit(_).expr).toImmutableArraySeq)
-        } else {
-          parsedPlan
+        val parsedPlan = sessionState.sqlParser.parseScript(sqlText)
+        parsedPlan match {
+          case CompoundBody(Seq(singleStmtPlan: SingleStatement), label) if 
args.nonEmpty =>
+            CompoundBody(Seq(SingleStatement(
+              PosParameterizedQuery(
+                singleStmtPlan.parsedPlan, 
args.map(lit(_).expr).toImmutableArraySeq))), label)
+          case p =>
+            assert(args.isEmpty, "Named parameters are not supported for batch 
queries")
+            p
         }
       }
-      Dataset.ofRows(self, plan, tracker)
+
+      plan match {
+        case CompoundBody(Seq(singleStmtPlan: SingleStatement), _) =>
+          Dataset.ofRows(self, singleStmtPlan.parsedPlan, tracker)
+        case _ =>
+          // execute the plan directly if it is not a single statement
+          val lastRow = executeScript(plan).foldLeft(Array.empty[Row])((_, 
next) => next)

Review Comment:
   let's think if we want to do this exactly this way, because:
   - `executeScript` is basically a simple one-liner and alias for 
interpreter's `execute` function
   - when we introduce multiple results in the future, it seems best to:
     - have `executeMultipleResults` in the interpreter
     - each function (`execute` and `executeMultipleResults` and maybe 
something new?) should collect data based on the type of data it needs to return
   
   I propose that `execute` family of methods in the interpreter should be 
responsible to handle the logic of which data is returned, instead of fetching 
last row here in `SparkSession`.
   
   I didn't write a ton of details here, I'm writing this comment as a reminder 
and we can discuss more offline.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48344][SQL] SQL API change to support execution of compound statements [spark]

Reply via email to