[GitHub] [spark] sunchao commented on a change in pull request #32354: [SPARK-35232][SQL] Nested column pruning should retain column metadata

GitBox Tue, 04 May 2021 10:39:25 -0700


sunchao commented on a change in pull request #32354:
URL: https://github.com/apache/spark/pull/32354#discussion_r625977975




##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTable.scala
##########
@@ -504,9 +509,39 @@ private class BufferedRowsReader(
     index < partition.rows.length
   }
 
-  override def get(): InternalRow = addMetadata(partition.rows(index))
+  override def get(): InternalRow = {
+    val originalRow = partition.rows(index)
+    val values = new Array[Any](nonMetadataColumns.length)
+    nonMetadataColumns.zipWithIndex.foreach { case (col, idx) =>
+      values(idx) = extractFieldValue(col, tableSchema, originalRow)
+    }
+    addMetadata(new GenericInternalRow(values))
+  }
 
   override def close(): Unit = {}
+
+  private def extractFieldValue(
+      field: StructField,
+      schema: StructType,
+      row: InternalRow): Any = {
+    val index = schema.fieldIndex(field.name)

Review comment:
       Good question. Looking at `PushdownUtils.pruneColumns`, I see that we 
apply `SQLConf.resolver` when nested column pruning is enabled, but seems not 
so when it is disabled. IMO perhaps we should have better contract between 
Spark and data source implementors w.r.t 
`SupportsPushDownRequiredColumns.pruneColumns`, and Spark should guarantee that 
the `requiredSchema` passed in to the method should be a "subset" of the 
relation's schema (e.g., table schema).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #32354: [SPARK-35232][SQL] Nested column pruning should retain column metadata

Reply via email to