[GitHub] [spark] HeartSaVioR commented on pull request #31944: [SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay.

GitBox Wed, 28 Apr 2021 14:51:27 -0700


HeartSaVioR commented on pull request #31944:
URL: https://github.com/apache/spark/pull/31944#issuecomment-828804207



   > > > I've tested it on real cluster and works fine.
   > > > Just a question. How this it intended to use for dynamic allocation?
   > > 
   > > Users can implement this interface in their customized SparkDataStream 
and know how far falling behind through the progress listener. Maybe this can 
provide more useful information to guide/trigger the auto scaling.
   
   > This is a valid user-case. But my question is that current offsets in 
`SourceProgress` should already provide the information the use-case needs 
(consumed offset, available offset).
   
   That is what understand as well - that is just a matter of "where" we want 
to put calculation.
   
   I have mixed feeling of this as:
   
   1) If the target persona is human, then I'd rather not let them calculate by 
themselves. It should be helpful to let Spark calculate and provide the 
information instead.
   
   2) If the target persona is a "process" (maybe Spark driver or some external 
app?), then it should not be that hard to calculate by itself.
   
   Not sure which is the actual use case for this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on pull request #31944: [SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay.

Reply via email to