[GitHub] [spark] cloud-fan commented on a diff in pull request #39996: [SPARK-42423][SQL] Add metadata column file block start and length

via GitHub Mon, 20 Feb 2023 01:08:44 -0800


cloud-fan commented on code in PR #39996:
URL: https://github.com/apache/spark/pull/39996#discussion_r1111659641



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala:
##########
@@ -77,7 +77,10 @@ abstract class PartitioningAwareFileIndex(
     // be applied to files.
     val fileMetadataFilterOpt = dataFilters.filter { f =>
       f.references.nonEmpty && f.references.forall {
-        case FileSourceConstantMetadataAttribute(_) => true
+        case FileSourceConstantMetadataAttribute(metadataAttr) =>
+          // we only know block start and length after splitting files, so 
skip it here
+          metadataAttr.name != FileFormat.FILE_BLOCK_START &&
+            metadataAttr.name != FileFormat.FILE_BLOCK_LENGTH

Review Comment:
   technically we can apply filters on file splits, but this is probably 
useless as it's non-sense to filter with block start and length.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #39996: [SPARK-42423][SQL] Add metadata column file block start and length

Reply via email to