[GitHub] [spark] GrigorievNick commented on pull request #27380: [SPARK-30669][SS] Introduce AdmissionControl APIs for StructuredStreaming

GitBox Wed, 24 Mar 2021 01:05:53 -0700


GrigorievNick commented on pull request #27380:
URL: https://github.com/apache/spark/pull/27380#issuecomment-805592781



   Hi,
   I know that these changes already in spark 3.
   But I have a question.
   How can I configure backpressure to my job when I want to use TriggerOnce?
   In spark 2.4 I have a use case, to backfill some data and then start the 
stream.
   So I use trigger once, but my backfill scenario can be very very big and 
sometimes create too big a load on my disks because of shuffles and to driver 
memory because FileIndex cached there.
   SO I use max `maxOffsetsPerTrigger` and `maxFilesPerTrigger` to control how 
much data my spark can process. that's how I configure backpressure. 
   
   And now you remove this ability, so assume you can suggest a better way to 
go?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] GrigorievNick commented on pull request #27380: [SPARK-30669][SS] Introduce AdmissionControl APIs for StructuredStreaming

Reply via email to