otterc commented on pull request #33446: URL: https://github.com/apache/spark/pull/33446#issuecomment-887937290
> > And I think a more straightforward way to do this probably would be to log slow fetches which are slower than most fetches. For example, we could log the 5%th percentile slow fetches among all fetches. The downside is that we have to store more intermediate states. > > This is a great idea actually. We can use [Dropwizard's Histogram with a uniform reservoir](https://metrics.dropwizard.io/4.1.2/manual/core.html#histograms) to keep histogram information with constant storage space. I have a different opinion here. - The number of requests that are in-flight are bounded by parameters like `maxBytesInFlight` and `maxReqsInFlight`. The number of blocks to fetch in each network request is bounded by these params as well. So, even for applications with larger shuffle, we do impose a limit on the amount of shuffle data being fetched at a time with these limits. For a fetch to be slow (bytes fetched per sec being lower than threshold), it is more likely a server issue which depends on how much time the shuffle server took to respond to a request. So, I don't think the threshold for marking it slow may vary from application to application in a similar cluster. - The slow fetches are not necessarily because of this particular application. It could be because of a shuffle server which is overloaded with requests from the other apps running in the same cluster. So, again the threshold for categorizing a fetch being slow is mostly dependent on this and the default should be something the cluster admins can determine for a cluster. - Usually, the user doesn't change configs like `maxBytesInFlight` and `maxBytesInFlight`, so they shouldn't have to change the configs related to this as well. - Storing the data for fetches just adds more state to the iterator even though we can bound it. Just adding more code around this seems overkill. For a user, this log is helpful to identify why one run of an application took longer than a previous run. If most of the runs of the same application have the same time, they will not even bother with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
