[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259280#comment-17259280 ] L. C. Hsieh commented on SPARK-33833: - Thanks [~kabhwan]. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259263#comment-17259263 ] Jungtaek Lim commented on SPARK-33833: -- (Just to make clear, I was talking about setting the group ID on main purpose, not for SPARK-27549. If I'm objecting this, then I won't create my own project. If you think this is good to have, please raise a discussion thread to gather consensus on this, so that we can make progress without a risk to soft reject again.) > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259094#comment-17259094 ] L. C. Hsieh commented on SPARK-33833: - I will close for now as duplicated. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258726#comment-17258726 ] L. C. Hsieh commented on SPARK-33833: - Yea, but this can be easily overcome here. We just need to have a user-provided group id for committing offset purpose. As users need to specify it when they want to commit offset and track the progress, this is used by users with caution. Even for committing with currently static group ID given by users, I do not think it is really a reason to reject the committing offset idea. Once users decide to commit offset and track the progress, they should be cautious with the risk. Anyway, this seems not the reason causing the previous PR to be closed. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258717#comment-17258717 ] Jungtaek Lim commented on SPARK-33833: -- That’s available with serious caution. Spark has to have full control of offset management and it shouldn’t be touched from outside in any way. Creating unique group ID is a defensive approach on this, preventing end users to mess up by accident. Once end users set the static group ID, the guard is no longer valid. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258716#comment-17258716 ] L. C. Hsieh commented on SPARK-33833: - I read though the comments in the previous PR. The approach is pretty similar as what I did locally. So I guess that if nothing changes, it won't be considered too in the Spark codebase. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258707#comment-17258707 ] L. C. Hsieh commented on SPARK-33833: - Btw, thanks for providing the useful link to previous ticket/PR. [~kabhwan] > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258705#comment-17258705 ] L. C. Hsieh commented on SPARK-33833: - I think SS allows users to specify custom group id, isn't it? > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258695#comment-17258695 ] Jungtaek Lim commented on SPARK-33833: -- For SS, consumer group is randomly generated by intention, which is the actual issue on leveraging the offset information with Kafka ecosystem. SPARK-27549 was the thing to address this, but that was unfortunately soft-rejected to have in Spark repository. Instead of pushing this more, I've just crafted the project on my repository - https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258693#comment-17258693 ] L. C. Hsieh commented on SPARK-33833: - [~samdvr] Can you help elaborate the question above? Thanks. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258692#comment-17258692 ] L. C. Hsieh commented on SPARK-33833: - Hmm, I did a few test locally. Does Burrow work only if Spark commits offset progress back to Kafka? I added some code to commit offset progress to Kafka. After I checked "__consumer_offsets" topic of Kafka, I found that no matter Spark commits the progress to Kafka or not, the record of the consumer group of the Spark SS query is always in "__consumer_offsets". Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups info from this "__consumer_offsets" topic. So if either Spark commits or not, there will be a record about the consumer group, does it mean Burrow still works without Spark committing offset progress to Kafka? > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251926#comment-17251926 ] L. C. Hsieh commented on SPARK-33833: - I'm also in need of tracking the lag, but I maybe don't need to use Burrow. I will try to do the first one as the first step. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251925#comment-17251925 ] L. C. Hsieh commented on SPARK-33833: - Hmm, as SS commits source offsets in external storage by SS itself, it doesn't commit source offsets back to Kafka. I think we can do a few things here: 1. Expose latest offsets of sources in StreamingQueryProgress. So users can track the log by themselves. But as this is not actually committed back to Kafka, users still cannot use Burrow to track it. 2. Commit source offset backs to Kafka to a dummy consumer group in KafkaSource. So users can track the lag using Burrow. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251512#comment-17251512 ] Hyukjin Kwon commented on SPARK-33833: -- Looks like it leverages listenters. Can you use QueryExecutionListener instead? > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org