snmvaughan commented on PR #46188:
URL: https://github.com/apache/spark/pull/46188#issuecomment-2152813241
@cloud-fan Did you still have concerns about collecting and reporting the
stats per partition?
--
This is an automated message from the Apache Git Service.
To respond to the
snmvaughan commented on PR #46188:
URL: https://github.com/apache/spark/pull/46188#issuecomment-2123350238
@cloud-fan Spark already collects information about the number of rows and
bytes written, but only reports the total aggregate. If you're concerned about
the overall size, it is
snmvaughan commented on PR #46188:
URL: https://github.com/apache/spark/pull/46188#issuecomment-2084197846
We're looking to collect deeper insights into what jobs are doing, beyond
the current read/write statistics such as bytes, num files, etc.
--
This is an automated message from the
snmvaughan commented on code in PR #46188:
URL: https://github.com/apache/spark/pull/46188#discussion_r1584036371
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala:
##
@@ -223,6 +278,9 @@ class BasicWriteJobStatsTracker(
snmvaughan commented on code in PR #46188:
URL: https://github.com/apache/spark/pull/46188#discussion_r1584033823
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala:
##
@@ -43,10 +44,18 @@ case class BasicWriteTaskStats(
cloud-fan commented on PR #46188:
URL: https://github.com/apache/spark/pull/46188#issuecomment-2081399740
is the end goal to automatically update table statistics?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
cloud-fan commented on code in PR #46188:
URL: https://github.com/apache/spark/pull/46188#discussion_r1582063564
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala:
##
@@ -223,6 +278,9 @@ class BasicWriteJobStatsTracker(
dbtsai commented on code in PR #46188:
URL: https://github.com/apache/spark/pull/46188#discussion_r1581313246
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala:
##
@@ -213,6 +260,14 @@ class BasicWriteJobStatsTracker(
dbtsai commented on PR #46188:
URL: https://github.com/apache/spark/pull/46188#issuecomment-2079779162
Gently pinging @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
snmvaughan commented on PR #46188:
URL: https://github.com/apache/spark/pull/46188#issuecomment-2072876115
cc @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
snmvaughan opened a new pull request, #46188:
URL: https://github.com/apache/spark/pull/46188
We currently capture metrics which include the number of files, bytes and
rows for a task along with the updated partitions.
This change captures metrics for each updated partition, reporting
11 matches
Mail list logo