skonto edited a comment on issue #24613: [SPARK-27549][SS] Add support for 
committing kafka offsets per batch for supporting external tooling
URL: https://github.com/apache/spark/pull/24613#issuecomment-494313373
 
 
   @gaborgsomogyi 
   > Something like monitoringGroupId
   One option to pass a full predictable groupId is at the point when the query 
object is created (start) but it seems that I can do that as well, options are 
defined per source from what I see and can be used when the set relations are 
transformed to the logical plan.
   
   @HeartSaVioR 
   > EDIT: This case might be still an issue even if query ID is being used as 
consumer ID - This may also occur when queries are started/restarted in quick 
succession.
   
   Could you elaborate a bit more on this? I think since the streaming object 
is kept around on a quick restart you could just start again from the beginning 
creating the ids. Do you mean that there will be issues because of re-using the 
same id in a short period of time? In that case I think the brokers are 
notified since the consumer is closed in a graceful shutdown of the query.
   
   My decision is try do the following:
   a) Allow user pass a gId per source with the `monitoringGroupId` which will 
have effect only when the user sets `commitOffsetsOnCheckpoints` flag. 
   b) On query restarts we start re-assigning the ids. When queries are run in 
parallel we make sure ids are distinct unless the user makes them identical on 
purpose.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to