Re: [PR] [SPARK-49723][SQL] Add Variant metrics to the JSON File Scan node [spark]

via GitHub Wed, 09 Oct 2024 11:12:25 -0700


gene-db commented on code in PR #48172:
URL: https://github.com/apache/spark/pull/48172#discussion_r1793952556



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala:
##########
@@ -72,9 +74,30 @@ case class PartitionedFile(
   }
 }
 
+/**
+ * Class used to store statistical data that is collected during a file scan 
and could be used to
+ * update the SQL metrics of the scan node. More members could be added to 
this class to to collect
+ * metrics related to new features.
+ */
+case class FileScanMetrics(
+    topLevelVariantMetrics: Option[VariantMetrics] = None,

Review Comment:
   I don't understand how `FileScanMetrics` and 
`topLevelVariantMetrics`/`nestedVariantMetrics` are used.
   
   It looked like there was only 1 caller using `new FileScanMetrics()` and 
there, we are always providing both of these variant metrics. When would we 
provide `None` for both or either of them? Do you foresee more callers which 
would change the option values?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49723][SQL] Add Variant metrics to the JSON File Scan node [spark]

Reply via email to