We are using the spark-runner and for *production monitoring*, the most
popular metrics for us have been ingest rates, (batch) processing times,
and memory usage, all of which which we obtain/calculate based on the
metrics provided by the underlying spark engine (e.g.,
totalProcessedRecords, lastCompletedBatch_*Delay, etc.).

Also, (Beam) source metrics such as backlog_elements can be helpful in
identifying when applications are unable to keep up with their input and
backlog starts building up.

If the question was more about correctness testing rather than ongoing
monitoring then what Lukasz said :)

On Wed, Nov 22, 2017 at 1:40 AM Lukasz Cwik <[email protected]> wrote:

> Are we talking about integration testing or general pipeline execution
> metrics?
>
> For integration testing, I would see that they users on PAssert and a test
> runner to do testing similar to Apache Beam's @ValidatesRunner or IO
> integration tests.
> For production pipeline monitoring, the common metric I have been told was
> tracking the watermark.
>
> But I'm sure others have different opinions.
>
> On Tue, Nov 21, 2017 at 3:35 PM, Holden Karau <[email protected]>
> wrote:
>
>> Hi Wonderful BEAM users,
>>
>> I'm wondering if folks would be willing to share what kind of metrics
>> they look at when validating their BEAM jobs?
>>
>> Cheers,
>>
>> Holden :)
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>

Reply via email to