Dataflow provides msec counters for each transform that executes. You
should be able to get them from stackdriver and see them from the Dataflow
UI.

You need to keep track of the timestamp of the element as it flows through
the system as part of data that goes alongside the element. You can use the
element's timestamp[1] if that makes sense (it might not if you intend to
use a timestamp that is from the kafka record itself and the record's
timestamp isn't the same as the ingestion timestamp). Unless you are
writing your own sink, the sink won't track the processing time at all so
you'll need to add a ParDo that goes right before it that writes the timing
information to wherever you want (a counter, your own metrics database,
logs, ...).

1:
https://github.com/apache/beam/blob/018e889829e300ab9f321da7e0010ff0011a73b1/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L257


On Thu, May 28, 2020 at 1:12 PM Talat Uyarer <tuya...@paloaltonetworks.com>
wrote:

> Yes I am trying to track how long it takes for a single element to be
> ingested into the pipeline until it is output somewhere.
>
> My pipeline is unbounded. I am using KafkaIO. I did not think about CPU
> time. if there is a way to track it too, it would be useful to improve my
> metrics.
>
> On Thu, May 28, 2020 at 12:52 PM Luke Cwik <lc...@google.com> wrote:
>
>> What do you mean by processing time?
>>
>> Are you trying to track how long it takes for a single element to be
>> ingested into the pipeline until it is output somewhere?
>> Do you have a bounded pipeline and want to know how long all the
>> processing takes?
>> Do you care about how much CPU time is being consumed in aggregate for
>> all the processing that your pipeline is doing?
>>
>>
>> On Thu, May 28, 2020 at 11:01 AM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> I am using Dataflow Runner. The pipeline read from kafkaIO and send
>>> Http. I could not find any metadata field on the element to set first read
>>> time.
>>>
>>> On Thu, May 28, 2020 at 10:44 AM Kyle Weaver <kcwea...@google.com>
>>> wrote:
>>>
>>>> Which runner are you using?
>>>>
>>>> On Thu, May 28, 2020 at 1:43 PM Talat Uyarer <
>>>> tuya...@paloaltonetworks.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a pipeline which has 5 steps. What is the best way to measure
>>>>> processing time for my pipeline?
>>>>>
>>>>> Thnaks
>>>>>
>>>>

Reply via email to