HeartSaVioR commented on pull request #31944: URL: https://github.com/apache/spark/pull/31944#issuecomment-828804207
> > > I've tested it on real cluster and works fine. > > > Just a question. How this it intended to use for dynamic allocation? > > > > Users can implement this interface in their customized SparkDataStream and know how far falling behind through the progress listener. Maybe this can provide more useful information to guide/trigger the auto scaling. > This is a valid user-case. But my question is that current offsets in `SourceProgress` should already provide the information the use-case needs (consumed offset, available offset). That is what understand as well - that is just a matter of "where" we want to put calculation. I have mixed feeling of this as: 1) If the target persona is human, then I'd rather not let them calculate by themselves. It should be helpful to let Spark calculate and provide the information instead. 2) If the target persona is a "process" (maybe Spark driver or some external app?), then it should not be that hard to calculate by itself. Not sure which is the actual use case for this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
