Re: [PR] [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally [spark]

via GitHub Tue, 05 Mar 2024 17:43:41 -0800


wForget commented on code in PR #45373:
URL: https://github.com/apache/spark/pull/45373#discussion_r1513704269



##########
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -4483,6 +4478,17 @@ class Dataset[T] private[sql](
     }
   }
 
+  /** Returns a optimized plan for CommandResult, convert to `LocalRelation`. 
*/
+  private def withCommandResultOptimized: Dataset[T] = {
+    logicalPlan match {
+      case c: CommandResult =>
+        // Convert to `LocalRelation` and let `ConvertToLocalRelation` do the 
casting locally to
+        // avoid triggering a job
+        Dataset(sparkSession, LocalRelation(c.output, c.rows))

Review Comment:
   This doesn't seem to be a universal optimization. Since the `getRows` and 
`isEmpty` methods only add a simple `Project` after the `CommandResult`, we 
don't need to trigger a job. However, if subsequent plans are more complex 
operations I think we could trigger a job.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally [spark]

Reply via email to