cloud-fan commented on code in PR #39408:
URL: https://github.com/apache/spark/pull/39408#discussion_r1069318980


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala:
##########
@@ -258,6 +258,36 @@ object FileSourceStrategy extends Strategy with 
PredicateHelper with Logging {
       val outputAttributes = readDataColumns ++ generatedMetadataColumns ++
         partitionColumns ++ constantMetadataColumns
 
+      // Rebind metadata attribute references in filters after the metadata 
attribute struct has
+      // been flattened. Only data filters can contain metadata references. 
After the rebinding
+      // all references will be bound to output attributes which are either
+      // [[FileSourceConstantMetadataAttribute]] or 
[[FileSourceGeneratedMetadataAttribute]] after
+      // the flattening from the metadata struct.
+      def rebindFileSourceMetadataAttributesInFilters(
+          filters: Seq[Expression]): Seq[Expression] = {
+        // The row index field attribute got renamed.
+        def newFieldName(name: String) = name match {
+          case FileFormat.ROW_INDEX => 
FileFormat.ROW_INDEX_TEMPORARY_COLUMN_NAME
+          case other => other
+        }
+
+        filters.map { filter =>
+          filter.transform {
+            // Replace references to the _metadata column. This will affect 
references to the column
+            // itself but also where fields from the metadata struct are used.
+            case FileSourceMetadataAttribute(
+                AttributeReference("_metadata", fields @ StructType(_), _, _)) 
=>

Review Comment:
   This reminds me of a similar code block in this file
   ```
         // Metadata attributes are part of a column of type struct up to this 
point. Here we extract
         // this column from the schema.
         val metadataStructOpt = l.output.collectFirst {
           case FileSourceMetadataAttribute(attr) => attr
         }
   ```
   Shall we unify them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to