brkyvz opened a new pull request #27380: [SPARK-30669][SS] Introduce 
AdmissionControl APIs for StructuredStreaming
URL: https://github.com/apache/spark/pull/27380
 
 
   ### What changes were proposed in this pull request?
   
   We propose to add a new interface `SupportsAdmissionControl` and 
`ReadLimit`. A ReadLimit defines how much data should be read in the next 
micro-batch. `SupportsAdmissionControl` specifies that a source can rate limit 
its ingest into the system. The source can tell the system what the user 
specified as a read limit, and the system can enforce this limit within each 
micro-batch or impose it's own limit if the Trigger is Trigger.Once() for 
example.
   
   ### Why are the changes needed?
   
   Sources currently have no information around execution semantics such as 
whether the stream is being executed in Trigger.Once() mode. This interface 
will pass this information into the sources as part of planning. With a trigger 
like Trigger.Once(), the semantics are to process all the data available to the 
datasource in a single micro-batch. However, this semantic can be broken when 
data source options such as `maxOffsetsPerTrigger` (in the Kafka source) rate 
limit the amount of data read for that micro-batch without this interface.
   
   ### Does this PR introduce any user-facing change?
   
   DataSource developers can extend this interface for their streaming sources 
to add admission control into their system and correctly support Trigger.Once().
   
   ### How was this patch tested?
   
   Existing tests, as this API is mostly internal
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to