[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758431#comment-15758431 ] ASF GitHub Bot commented on BEAM-1126: -- Github user aviemzur closed the pull request at: https://github.com/apache/incubator-beam/pull/1574 > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749908#comment-15749908 ] Daniel Halperin commented on BEAM-1126: --- So https://issues.apache.org/jira/browse/BEAM-774 is a relevant subtask? > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749901#comment-15749901 ] Ben Chambers commented on BEAM-1126: The majority of the work is getting Metrics supported well-enough to start removing Aggregators and moving code/examples/documentation towards Metrics. This work (for Java) is tracked in this issue http://issues.apache.org/jira/browse/BEAM-147. I'm working on the Dataflow runner changes right now. The other runners could choose to implement Metrics either in a way similar to how they currently support Aggregators (providing the "committed" value across work that succeeded) or using their own Metrics mechanisms (providing an "attempted" value across all attempts at work). > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747788#comment-15747788 ] Aviem Zur commented on BEAM-1126: - Yeah, that logic is sound. I'd be glad to help with that effort. Are there any open issues to tackle? > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745258#comment-15745258 ] Daniel Halperin commented on BEAM-1126: --- If UnboundedSource supported aggregators/metrics today, would you still want this change? (Assuming the answer is no:) I don't like the idea of complicating core APIs solely as a short term workaround for the issue that metrics aren't supported everywhere. We are actively trying to fix metrics to be usable in more places; it seems a better long-term solution to wait for [or help with!] that effort. (Assuming the answer is yes:) Say more? > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744487#comment-15744487 ] Aviem Zur commented on BEAM-1126: - The intent is indeed to report number of events via a metric/aggregator, but the context for this number is inside the {{UnboundedSource}} implementation, which is why exposing this number via a method is required. > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742473#comment-15742473 ] Daniel Halperin commented on BEAM-1126: --- This [thread on the dev list|https://lists.apache.org/thread.html/03792d43e94b7d1c342617e64511a62a681b7c2c6797055394ff22a8@%3Cdev.beam.apache.org%3E] has the additional context Davor is presumably asking for. I think the confusion is between human-comprehensible and machine-comprehensible. Using {{bytes}} as the measure of backlog was not written with PubSub in mind, it was written because bytes is more directly related to overhead than events. Using bytes also allows for comparison between sources of different types... so {{bytes}} is generally a pretty good signal for runners, and better than {{events}}. If the purpose of exposing {{events}} is purely for human visibility, this is probably indeed better done using metric or aggregator reporting. [~bchambers] has been thinking most about metrics recently, maybe he has additional thoughts? > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15740357#comment-15740357 ] Davor Bonaci commented on BEAM-1126: Interesting perspective, thanks [~aviemzur]. I think the primary design goal of the current API was to enable dynamic optimizations, as opposed to monitoring scenarios. The general idea was that the source should provide an indication of amount of pending work, and it was probably thought that the size in bytes better correlates to "work" than the size in terms of number of elements. Basically, it was intended that the consumer of the data is the runner, not the user. That said, monitoring scenarios are possibly even more important. I think the idea there was that the source should publish monitoring metrics directly thought Beam abstractions in a runner-independent way. Then, all runners would get this benefit, with no particular work required, in a metric that makes sense for that source. (However, I don't think a source can do this today -- but this could a different approach for the same problem.) Anyways, I'm sure [~dhalp...@google.com] will comment more ;-) > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15739694#comment-15739694 ] Aviem Zur commented on BEAM-1126: - Reasoning: The backlog accessors are a very good indicator for application monitoring. As such, we plan to expose backlog as aggregators in spark-runner. Number of events is more human comprehensible than bytes. Specifically, in Kafka, backlog (or lag) is reasoned about in {{number of messages}}. See: https://kafka.apache.org/documentation#others_monitoring If I understand correctly, for {{PubSub}} it is more common to reason about backlog in bytes, however, the implementation for {{KafkaIO}} seems forced, applying a byte approximation on a value that is originally in {{number of messages}}: {code:java} synchronized long approxBacklogInBytes() { // Note that is an an estimate of uncompressed backlog. if (latestOffset < 0 || nextOffset < 0) { return UnboundedReader.BACKLOG_UNKNOWN; } return Math.max(0, (long) ((latestOffset - nextOffset) * avgRecordSize)); } {code} In conclusion - it seems that the API was written with {{PubSub}} in mind, however, {{Kafka}}, the open source equivalent, relates to backlog in terms of {{number of messages}}. > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15738164#comment-15738164 ] ASF GitHub Bot commented on BEAM-1126: -- GitHub user aviemzur opened a pull request: https://github.com/apache/incubator-beam/pull/1574 [BEAM-1126] Expose UnboundedSource split backlog in number of events Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/aviemzur/incubator-beam backlog-num-events Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1574.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1574 commit d550b9350bba3e7799f00ce09234243dc67a699c Author: Aviem ZurDate: 2016-12-10T17:06:06Z [BEAM-1126] Expose UnboundedSource split backlog in number of events > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today {{UnboundedSource}} exposes split backlog in bytes via > {{getSplitBacklogBytes()}} > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
[ https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15738092#comment-15738092 ] Davor Bonaci commented on BEAM-1126: Assigning to [~dhalp...@google.com] for thoughts here. > Expose UnboundedSource split backlog in number of events > > > Key: BEAM-1126 > URL: https://issues.apache.org/jira/browse/BEAM-1126 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Aviem Zur >Assignee: Daniel Halperin >Priority: Minor > > Today UnboundedSource exposes split backlog in bytes via > getSplitBacklogBytes() > There is value in exposing backlog in number of events as well, since this > number can be more human comprehensible than bytes. something like > getSplitBacklogEvents() or getSplitBacklogCount(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)