[GitHub] [spark] viirya commented on a change in pull request #34642: [SPARK-37369][SQL] Avoid redundant ColumnarToRow transistion on InMemoryTableScan

GitBox Thu, 09 Dec 2021 17:30:45 -0800


viirya commented on a change in pull request #34642:
URL: https://github.com/apache/spark/pull/34642#discussion_r766277525




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##########
@@ -256,7 +256,8 @@ case class CachedRDDBuilder(
   }
 
   private def buildBuffers(): RDD[CachedBatch] = {
-    val cb = if (cachedPlan.supportsColumnar) {
+    val cb = if (cachedPlan.supportsColumnar &&
+        serializer.supportsColumnarInput(cachedPlan.output)) {

Review comment:
       This is actually a bug. `cachedPlan.supportsColumnar` only indicates the 
cached plan can output columnar format, but whether this cached rdd builder can 
take such input, is depending on its serializer.
   
   There is one test which failed due to the proposed change. I remember that 
it happens for `InMemoryRelation` under `InMemoryRelation`. 
   
   Previously we always add additional `ColumnarToRow` transition between two 
`InMemoryRelation`s, so we don't hit this. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #34642: [SPARK-37369][SQL] Avoid redundant ColumnarToRow transistion on InMemoryTableScan

Reply via email to