GrigorievNick commented on pull request #27380: URL: https://github.com/apache/spark/pull/27380#issuecomment-805592781
Hi, I know that these changes already in spark 3. But I have a question. How can I configure backpressure to my job when I want to use TriggerOnce? In spark 2.4 I have a use case, to backfill some data and then start the stream. So I use trigger once, but my backfill scenario can be very very big and sometimes create too big a load on my disks because of shuffles and to driver memory because FileIndex cached there. SO I use max `maxOffsetsPerTrigger` and `maxFilesPerTrigger` to control how much data my spark can process. that's how I configure backpressure. And now you remove this ability, so assume you can suggest a better way to go? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
