+dev <dev@beam.apache.org>

On Mon, Aug 5, 2019 at 12:49 PM Dmitry Minaev <mina...@gmail.com> wrote:

> Hi there,
>
> I'm building streaming pipelines in Beam (using Google Dataflow runner)
> and using Google Pubsub as a message broker. I've made a couple of
> experiments with a very simple pipeline: consume events from Pubsub
> subscription, add a timestamp to the message body, emit the new event to
> another Pubsub topic. I'm using all the default parameters when producing
> and consuming messages.
>
> I've noticed a pretty high latency while consuming messages in Dataflow
> from Pubsub. My observations show that average duration between the event
> create timestamp (simple producer that publishes events to Pubsub) and
> event consume timestamp (Google Dataflow using PubsubIO) is more than 2
> seconds. I've been publishing messages at different rates, e.g. 10 msg/sec,
> 1000 msg/sec, 10,000 msg/sec. And the latency never went lower than 2
> seconds. Such latency looks really high. I've tried with direct runner and
> it has high latency too.
>
> I've made a few other experiments with Kafka (very small Kafka cluster)
> and the same kind of pipeline: consume from Kafka, add timestamp, publish
> to another Kafka topic. I saw the latency is much lower, on average it's
> about 150 milliseconds.
>
> I suspect there is some batching in PubsubIO that makes the latency so
> high.
>
> My questions are: what should be expected latency in this kind of
> scenarios? Is there any recommendations to achieve lower latency?
>
> I appreciate any help on this!
>
> Thank you,
> Dmitry.
>

Reply via email to