You understanding is correct - the data watermark will only matter for
windowing. It will not affect auto-scaling. If the pipeline is not doing
any windowing, triggering, etc then there is no need to pay for the cost of
the second subscription.

On Thu, Aug 3, 2017 at 8:17 AM, Josh <jof...@gmail.com> wrote:

> Hi all,
>
> We've been running a few streaming Beam jobs on Dataflow, where each job
> is consuming from PubSub via PubSubIO. Each job does something like this:
>
> PubsubIO.readMessagesWithAttributes()
>             .withIdAttribute("unique_id")
>             .withTimestampAttribute("timestamp");
>
> My understanding of `withTimestampAttribute` is that it means we use the
> timestamp on the PubSub message as Beam's concept of time (the watermark) -
> so that any windowing we do in the job uses "event time" rather than
> "processing time".
>
> My question is: is my understanding correct, and does using
> `withTimestampAttribute` have any effect in a job that doesn't do any
> windowing? I have a feeling it may also have an effect on Dataflow's
> autoscaling, since I think Dataflow scales up when the watermark timestamp
> lags behind, but I'm not sure about this.
>
> The reason I'm concerned about this is because we've been using it in all
> our Dataflow jobs, and have now realised that whenever
> `withTimestampAttribute` is used, Dataflow creates an additional PubSub
> subscription (suffixed with `__streaming_dataflow_internal`), which
> appears to be doubling PubSub costs (since we pay per subscription)! So I
> want to remove `withTimestampAttribute` from jobs where possible, but want
> to first understand the implications.
>
> Thanks for any advice,
> Josh
>

Reply via email to