[GitHub] [spark] yijiacui-db commented on pull request #31944: [SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay.

GitBox Wed, 28 Apr 2021 16:05:47 -0700


yijiacui-db commented on pull request #31944:
URL: https://github.com/apache/spark/pull/31944#issuecomment-828836467



   > > > > I've tested it on real cluster and works fine.
   > > > > Just a question. How this it intended to use for dynamic allocation?
   > > > 
   > > > 
   > > > Users can implement this interface in their customized SparkDataStream 
and know how far falling behind through the progress listener. Maybe this can 
provide more useful information to guide/trigger the auto scaling.
   > 
   > > This is a valid user-case. But my question is that current offsets in 
`SourceProgress` should already provide the information the use-case needs 
(consumed offset, available offset).
   > 
   > That is what understand as well - that is just a matter of "where" we want 
to put calculation.
   > 
   > I have mixed feeling of this as:
   > 
   > 1. If the target persona is human, then I'd rather not let them calculate 
by themselves. It should be helpful to let Spark calculate and provide the 
information instead.
   > 2. If the target persona is a "process" (maybe Spark driver or some 
external app?), then it should not be that hard to calculate by itself.
   > 
   > Not sure which is the actual use case for this PR.
   
   @HeartSaVioR This is a good question! I already updated my answer in the 
comment above, for how it works, and why we need this metrics interface. No 
matter whether the target persona is human or process, it's always possible 
that what is available as the latest is something internal to the customized 
spark data stream and can't be reported as offset, so it's not possible to 
calculate metrics using offsets and report them as offsets. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] yijiacui-db commented on pull request #31944: [SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay.

Reply via email to