Re: [PR] [SPARK-47050][SQL] Collect and publish partition level metrics for V1 [spark]

via GitHub Tue, 21 May 2024 13:01:57 -0700


snmvaughan commented on PR #46188:
URL: https://github.com/apache/spark/pull/46188#issuecomment-2123350238


   @cloud-fan Spark already collects information about the number of rows and 
bytes written, but only reports the total aggregate.  If you're concerned about 
the overall size, it is limited to the number of partitions instead of 
collecting it by file.  The currently V1 writers only know about the path they 
are writing to, which is why I wanted to augment the `newFIle` with additional 
information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47050][SQL] Collect and publish partition level metrics for V1 [spark]

Reply via email to