Thanks for bringing this question Holden. I have also been thinking
about this for a while and I have the impression that Beam needs to
expose more ‘system’ metrics to the users, so far we have mostly cared
about filling the user-defined metrics space. However once anyone
starts using Beam it is normal to need some metrics to monitor the
progress of the pipelines in production.

We have discussed in the past about some possible metrics for the IOs
without too much progress. See this proposal for example:
https://lists.apache.org/thread.html/18dd491f704e7bbcf1b6ce895c82e7c3b35981b0300dbc4142a32105@%3Cdev.beam.apache.org%3E

I think this was/is an excellent idea, we had to probably bring it
back and add some other metrics provided by the runners and expose all
of those in a unified way (with the same Beam API). For some
inspiration on possible metrics for the runners maybe we can look at
what each system has. I just saw some weeks ago the talk on monitoring
for Dataflow from google next and there are some interesting ones
there.

Monitoring and improving your big data applications (Google Cloud Next '17)
https://www.youtube.com/watch?v=hEteVlEHa60

I know that each data processing system (e.g. Spark, Flink, Dataflow,
etc) has their own metrics sub-system, and we can argue if this should
be a task of Beam that so far has been just a ‘translation’ layer, but
if we really want to get more users into Beam we need to offer at
least some convenient and unified methods for this kind of tasks. Also
with the ‘portability’ effort we will probably need to have some basic
set of metrics too to know what is going on inside of the SDK
harnesses.

Reply via email to