You understanding is correct - the data watermark will only matter for windowing. It will not affect auto-scaling. If the pipeline is not doing any windowing, triggering, etc then there is no need to pay for the cost of the second subscription.
On Thu, Aug 3, 2017 at 8:17 AM, Josh <jof...@gmail.com> wrote: > Hi all, > > We've been running a few streaming Beam jobs on Dataflow, where each job > is consuming from PubSub via PubSubIO. Each job does something like this: > > PubsubIO.readMessagesWithAttributes() > .withIdAttribute("unique_id") > .withTimestampAttribute("timestamp"); > > My understanding of `withTimestampAttribute` is that it means we use the > timestamp on the PubSub message as Beam's concept of time (the watermark) - > so that any windowing we do in the job uses "event time" rather than > "processing time". > > My question is: is my understanding correct, and does using > `withTimestampAttribute` have any effect in a job that doesn't do any > windowing? I have a feeling it may also have an effect on Dataflow's > autoscaling, since I think Dataflow scales up when the watermark timestamp > lags behind, but I'm not sure about this. > > The reason I'm concerned about this is because we've been using it in all > our Dataflow jobs, and have now realised that whenever > `withTimestampAttribute` is used, Dataflow creates an additional PubSub > subscription (suffixed with `__streaming_dataflow_internal`), which > appears to be doubling PubSub costs (since we pay per subscription)! So I > want to remove `withTimestampAttribute` from jobs where possible, but want > to first understand the implications. > > Thanks for any advice, > Josh >