[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758431#comment-15758431
 ] 

ASF GitHub Bot commented on BEAM-1126:
--

Github user aviemzur closed the pull request at:

https://github.com/apache/incubator-beam/pull/1574


> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-14 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749908#comment-15749908
 ] 

Daniel Halperin commented on BEAM-1126:
---

So https://issues.apache.org/jira/browse/BEAM-774 is a relevant subtask?

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-14 Thread Ben Chambers (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749901#comment-15749901
 ] 

Ben Chambers commented on BEAM-1126:


The majority of the work is getting Metrics supported well-enough to start 
removing Aggregators and moving code/examples/documentation towards Metrics. 
This work (for Java) is tracked in this issue 
http://issues.apache.org/jira/browse/BEAM-147.

I'm working on the Dataflow runner changes right now. The other runners could 
choose to implement Metrics either in a way similar to how they currently 
support Aggregators (providing the "committed" value across work that 
succeeded) or using their own Metrics mechanisms (providing an "attempted" 
value across all attempts at work).

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-14 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747788#comment-15747788
 ] 

Aviem Zur commented on BEAM-1126:
-

Yeah, that logic is sound.
I'd be glad to help with that effort. Are there any open issues to tackle?

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-13 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745258#comment-15745258
 ] 

Daniel Halperin commented on BEAM-1126:
---

If UnboundedSource supported aggregators/metrics today, would you still want 
this change?

(Assuming the answer is no:)

I don't like the idea of complicating core APIs solely as a short term 
workaround for the issue that metrics aren't supported everywhere. We are 
actively trying to fix metrics to be usable in more places; it seems a better 
long-term solution to wait for [or help with!] that effort.

(Assuming the answer is yes:)

Say more?

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-12 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744487#comment-15744487
 ] 

Aviem Zur commented on BEAM-1126:
-

The intent is indeed to report number of events via a metric/aggregator, but 
the context for this number is inside the {{UnboundedSource}} implementation, 
which is why exposing this number via a method is required.

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-12 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742473#comment-15742473
 ] 

Daniel Halperin commented on BEAM-1126:
---

This [thread on the dev 
list|https://lists.apache.org/thread.html/03792d43e94b7d1c342617e64511a62a681b7c2c6797055394ff22a8@%3Cdev.beam.apache.org%3E]
 has the additional context Davor is presumably asking for.

I think the confusion is between human-comprehensible and 
machine-comprehensible. Using {{bytes}} as the measure of backlog was not 
written with PubSub in mind, it was written because bytes is more directly 
related to overhead than events. Using bytes also allows for comparison between 
sources of different types... so {{bytes}} is generally a pretty good signal 
for runners, and better than {{events}}.

If the purpose of exposing {{events}} is purely for human visibility, this is 
probably indeed better done using metric or aggregator reporting. [~bchambers] 
has been thinking most about metrics recently, maybe he has additional thoughts?

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-11 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15740357#comment-15740357
 ] 

Davor Bonaci commented on BEAM-1126:


Interesting perspective, thanks [~aviemzur].

I think the primary design goal of the current API was to enable dynamic 
optimizations, as opposed to monitoring scenarios. The general idea was that 
the source should provide an indication of amount of pending work, and it was 
probably thought that the size in bytes better correlates to "work" than the 
size in terms of number of elements. Basically, it was intended that the 
consumer of the data is the runner, not the user.

That said, monitoring scenarios are possibly even more important. I think the 
idea there was that the source should publish monitoring metrics directly 
thought Beam abstractions in a runner-independent way. Then, all runners would 
get this benefit, with no particular work required, in a metric that makes 
sense for that source. (However, I don't think a source can do this today -- 
but this could a different approach for the same problem.)

Anyways, I'm sure [~dhalp...@google.com] will comment more ;-)

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-11 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15739694#comment-15739694
 ] 

Aviem Zur commented on BEAM-1126:
-

Reasoning: 
The backlog accessors are a very good indicator for application monitoring. As 
such, we plan to expose backlog as aggregators in spark-runner. 
Number of events is more human comprehensible than bytes. Specifically, in 
Kafka, backlog (or lag) is reasoned about in {{number of messages}}. See: 
https://kafka.apache.org/documentation#others_monitoring
If I understand correctly, for {{PubSub}} it is more common to reason about 
backlog in bytes, however, the implementation for {{KafkaIO}} seems forced, 
applying a byte approximation on a value that is originally in {{number of 
messages}}:
{code:java}
synchronized long approxBacklogInBytes() {
  // Note that is an an estimate of uncompressed backlog.
  if (latestOffset < 0 || nextOffset < 0) {
return UnboundedReader.BACKLOG_UNKNOWN;
  }
  return Math.max(0, (long) ((latestOffset - nextOffset) * avgRecordSize));
}
{code}
In conclusion - it seems that the API was written with {{PubSub}} in mind, 
however, {{Kafka}}, the open source equivalent, relates to backlog in terms of 
{{number of messages}}. 

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15738164#comment-15738164
 ] 

ASF GitHub Bot commented on BEAM-1126:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/incubator-beam/pull/1574

[BEAM-1126] Expose UnboundedSource split backlog in number of events

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/incubator-beam backlog-num-events

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1574.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1574


commit d550b9350bba3e7799f00ce09234243dc67a699c
Author: Aviem Zur 
Date:   2016-12-10T17:06:06Z

[BEAM-1126] Expose UnboundedSource split backlog in number of events




> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-10 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15738092#comment-15738092
 ] 

Davor Bonaci commented on BEAM-1126:


Assigning to [~dhalp...@google.com] for thoughts here.

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today UnboundedSource exposes split backlog in bytes via 
> getSplitBacklogBytes()
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> getSplitBacklogEvents() or getSplitBacklogCount().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)