[GitHub] [spark] viirya commented on a change in pull request #34499: [SPARK-37221][SQL] The collect-like API in SparkPlan should support columnar output

GitBox Tue, 09 Nov 2021 09:38:43 -0800


viirya commented on a change in pull request #34499:
URL: https://github.com/apache/spark/pull/34499#discussion_r745857029




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
##########
@@ -322,7 +322,12 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with 
Logging with Serializ
    */
   private def getByteArrayRdd(
       n: Int = -1, takeFromEnd: Boolean = false): RDD[(Long, Array[Byte])] = {
-    execute().mapPartitionsInternal { iter =>
+    val rdd = if (supportsColumnar) {
+      ColumnarToRowExec(this).execute()

Review comment:
       `execute` and `executeColumnar` are APIs internally used for query 
execution, but `executeCollect` is somehow different as it works more like a 
human-facing collect API, though it is not actually end-user facing one. But it 
collects internal rows and decodes them, it's convenient one to use to examine 
the output of arbitrary SparkPlan.
   
   In other words, I can't use `SparkPlan.execute` or 
`SparkPlan.executeColumnar` to collect meaningful rows (readable) but I think 
`executeCollect` should be able for that. Previously it is only for 
non-columnar plan, though.
   
   The framework is to guarantee what you said for the entire query. But there 
is no `ColumnarToRowExec` after each columnar plan (as some might be 
continuously columnar). During development, if developers want to examine 
arbitrary columnar plan, the folks may need to convert it manually. I feel it 
is verbose for every time to do it again.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #34499: [SPARK-37221][SQL] The collect-like API in SparkPlan should support columnar output

Reply via email to