[GitHub] [spark] gaborgsomogyi commented on issue #24613: [SPARK-27549][SS] Add support for committing kafka offsets per batch for supporting external tooling

GitBox Mon, 20 May 2019 08:02:44 -0700

gaborgsomogyi commented on issue #24613: [SPARK-27549][SS] Add support for 
committing kafka offsets per batch for supporting external tooling
URL: https://github.com/apache/spark/pull/24613#issuecomment-494025320
 
 
   > Is there a way for the user to optionally provide a full group.id per 
Spark Query?
   
   ```
     The Kafka group id to use in Kafka consumer while reading from Kafka. Use 
this with caution.
     By default, each query generates a unique group id for reading data. This 
ensures that each Kafka
     source has its own consumer group that does not face interference from any 
other consumer, and
     therefore can read all of the partitions of its subscribed topics. In some 
scenarios (for example,
     Kafka group-based authorization), you may want to use a specific 
authorized group id to read data.
     You can optionally set the group id. However, do this with extreme caution 
as it can cause
     unexpected behavior. Concurrently running queries (both, batch and 
streaming) or sources with the
     same group id are likely interfere with each other causing each query to 
read only part of the
     data. This may also occur when queries are started/restarted in quick 
succession. To minimize such
     issues, set the Kafka consumer session timeout (by setting option 
"kafka.session.timeout.ms") to
     be very small. When this is set, option "groupIdPrefix" will be ignored. 
</td>
   ```
   At the moment this is not the default and forcing users to use this because 
of lag monitoring reasons is not a good direction.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gaborgsomogyi commented on issue #24613: [SPARK-27549][SS] Add support for committing kafka offsets per batch for supporting external tooling

Reply via email to