Re: BEAM counters for validation

2017-11-27 Thread Ben Chambers
+1 to what Ismael said.

If Beam is just a "translation" layer it is less useful than if Beam
enables actual portability between runners. Being actually portable
requires more than just translating a pipeline for execution on the runner
-- it means making it possible to pull down metrics and interact with the
pipeline using common APIs without needing to rewrite your code when you
want to switch runners.

-- Ben

On Mon, Nov 27, 2017 at 1:41 PM Ismaël Mejía  wrote:

> Thanks for bringing this question Holden. I have also been thinking
> about this for a while and I have the impression that Beam needs to
> expose more ‘system’ metrics to the users, so far we have mostly cared
> about filling the user-defined metrics space. However once anyone
> starts using Beam it is normal to need some metrics to monitor the
> progress of the pipelines in production.
>
> We have discussed in the past about some possible metrics for the IOs
> without too much progress. See this proposal for example:
>
> https://lists.apache.org/thread.html/18dd491f704e7bbcf1b6ce895c82e7c3b35981b0300dbc4142a32105@%3Cdev.beam.apache.org%3E
>
> I think this was/is an excellent idea, we had to probably bring it
> back and add some other metrics provided by the runners and expose all
> of those in a unified way (with the same Beam API). For some
> inspiration on possible metrics for the runners maybe we can look at
> what each system has. I just saw some weeks ago the talk on monitoring
> for Dataflow from google next and there are some interesting ones
> there.
>
> Monitoring and improving your big data applications (Google Cloud Next '17)
> https://www.youtube.com/watch?v=hEteVlEHa60
>
> I know that each data processing system (e.g. Spark, Flink, Dataflow,
> etc) has their own metrics sub-system, and we can argue if this should
> be a task of Beam that so far has been just a ‘translation’ layer, but
> if we really want to get more users into Beam we need to offer at
> least some convenient and unified methods for this kind of tasks. Also
> with the ‘portability’ effort we will probably need to have some basic
> set of metrics too to know what is going on inside of the SDK
> harnesses.
>


Re: BEAM counters for validation

2017-11-21 Thread Lukasz Cwik
Are we talking about integration testing or general pipeline execution
metrics?

For integration testing, I would see that they users on PAssert and a test
runner to do testing similar to Apache Beam's @ValidatesRunner or IO
integration tests.
For production pipeline monitoring, the common metric I have been told was
tracking the watermark.

But I'm sure others have different opinions.

On Tue, Nov 21, 2017 at 3:35 PM, Holden Karau  wrote:

> Hi Wonderful BEAM users,
>
> I'm wondering if folks would be willing to share what kind of metrics they
> look at when validating their BEAM jobs?
>
> Cheers,
>
> Holden :)
>
> --
> Twitter: https://twitter.com/holdenkarau
>