Re: [PR] [SPARK-50130][SQL][PYTHON] Add DataFrame APIs for scalar and exists subqueries [spark]

via GitHub Tue, 12 Nov 2024 05:11:41 -0800


hvanhovell commented on code in PR #48664:
URL: https://github.com/apache/spark/pull/48664#discussion_r1838077151



##########
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -95,9 +95,14 @@ private[sql] object Dataset {
   def ofRows(sparkSession: SparkSession, logicalPlan: LogicalPlan): DataFrame =
     sparkSession.withActive {
       val qe = sparkSession.sessionState.executePlan(logicalPlan)
-      qe.assertAnalyzed()
-      new Dataset[Row](qe, RowEncoder.encoderFor(qe.analyzed.schema))
-  }
+      val encoder = if (qe.isLazyAnalysis) {
+        RowEncoder.encoderFor(new StructType())

Review Comment:
   @ueshin this breaks collect (and other operations) on these dataframes. The 
alternative would be to defer the construction of the encoder until it is 
needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50130][SQL][PYTHON] Add DataFrame APIs for scalar and exists subqueries [spark]

Reply via email to