[GitHub] [spark] cloud-fan commented on a change in pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

GitBox Thu, 30 Jul 2020 07:04:25 -0700


cloud-fan commented on a change in pull request #29067:
URL: https://github.com/apache/spark/pull/29067#discussion_r463017864




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
##########
@@ -130,34 +87,29 @@ case class InMemoryTableScanExec(
     val numOutputRows = longMetric("numOutputRows")
     // Using these variables here to avoid serialization of entire objects (if 
referenced
     // directly) within the map Partitions closure.
-    val relOutput: AttributeSeq = relation.output
-
-    filteredCachedBatches().mapPartitionsInternal { cachedBatchIterator =>
-      // Find the ordinals and data types of the requested columns.
-      val (requestedColumnIndices, requestedColumnDataTypes) =
-        attributes.map { a =>
-          relOutput.indexOf(a.exprId) -> a.dataType
-        }.unzip
+    val relOutput = relation.output
+    val serializer = relation.cacheBuilder.serializer
 
-      // update SQL metrics
-      val withMetrics = cachedBatchIterator.map { batch =>
+    // update SQL metrics
+    val withMetrics =
+      filteredCachedBatches().map{ batch =>
         if (enableAccumulatorsForTest) {
           readBatches.add(1)
         }
         numOutputRows += batch.numRows
         batch
       }
-
-      val columnTypes = requestedColumnDataTypes.map {
-        case udt: UserDefinedType[_] => udt.sqlType
-        case other => other
-      }.toArray
-      val columnarIterator = GenerateColumnAccessor.generate(columnTypes)
-      columnarIterator.initialize(withMetrics, columnTypes, 
requestedColumnIndices.toArray)
-      if (enableAccumulatorsForTest && columnarIterator.hasNext) {
-        readPartitions.add(1)
+    val rows = serializer.convertCachedBatchToInternalRow(withMetrics, 
relOutput, attributes, conf)
+    if (enableAccumulatorsForTest) {

Review comment:
       ah I see, makes sense




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

Reply via email to