Re: Is it safe to cache the value of a singleton view (with a global window) in a DoFn?
Keep your code simple and rely on the runner caching the value locally so it should be very cheap to access. If you have a performance issue due to a runner lacking caching, it would be good to hear about it so we could file a JIRA about it. On Mon, May 6, 2019 at 4:24 PM Kenneth Knowles wrote: > A singleton view in the global window and no triggering does have just a > single immutable value. (It really ought to have an updated value in the > presence of triggers, but I believe instead you will receive a crash. I > haven't tested.) > > In general, a side input yields one value per window. Dataflow in batch > will already do what you describe, but works with all windows. Dataflow in > streaming has some caching but if you see a problem that is interesting > information. > > Kenn > > On Sat, May 4, 2019 at 9:19 AM Steve Niemitz wrote: > >> I have a singleton view in a global window that is read from a DoFn. I'm >> curious if its "correct" to cache that value from the view, or if I need to >> read it every time. >> >> As a (simplified) example, if I were to generate the view as such: >> >> input.getPipeline >> .apply(Create.of(Collections.singleton[Void](null))) >> .apply(MapElements.via(new SimpleFunction[Void, JLong]() { >> override def apply(input: Void): JLong = { >> Instant.now().getMillis >> } >> })).apply(View.asSingleton[JLong]()) >> >> and then read it from a DoFn (using context.sideInput), is it guaranteed >> that: >> - every instance of the DoFn will read the same value? >> - The value will never change? >> >> If so it seems like it'd be safe to cache the value inside the DoFn. It >> seems like this would be the case, but I've also seen cases in dataflow >> where the UI indicates that the MapElements step above produced more than >> one element, so I'm curious what people have to say. >> >> Thanks! >> >
Re: Is it safe to cache the value of a singleton view (with a global window) in a DoFn?
A singleton view in the global window and no triggering does have just a single immutable value. (It really ought to have an updated value in the presence of triggers, but I believe instead you will receive a crash. I haven't tested.) In general, a side input yields one value per window. Dataflow in batch will already do what you describe, but works with all windows. Dataflow in streaming has some caching but if you see a problem that is interesting information. Kenn On Sat, May 4, 2019 at 9:19 AM Steve Niemitz wrote: > I have a singleton view in a global window that is read from a DoFn. I'm > curious if its "correct" to cache that value from the view, or if I need to > read it every time. > > As a (simplified) example, if I were to generate the view as such: > > input.getPipeline > .apply(Create.of(Collections.singleton[Void](null))) > .apply(MapElements.via(new SimpleFunction[Void, JLong]() { > override def apply(input: Void): JLong = { > Instant.now().getMillis > } > })).apply(View.asSingleton[JLong]()) > > and then read it from a DoFn (using context.sideInput), is it guaranteed > that: > - every instance of the DoFn will read the same value? > - The value will never change? > > If so it seems like it'd be safe to cache the value inside the DoFn. It > seems like this would be the case, but I've also seen cases in dataflow > where the UI indicates that the MapElements step above produced more than > one element, so I'm curious what people have to say. > > Thanks! >
Is it safe to cache the value of a singleton view (with a global window) in a DoFn?
I have a singleton view in a global window that is read from a DoFn. I'm curious if its "correct" to cache that value from the view, or if I need to read it every time. As a (simplified) example, if I were to generate the view as such: input.getPipeline .apply(Create.of(Collections.singleton[Void](null))) .apply(MapElements.via(new SimpleFunction[Void, JLong]() { override def apply(input: Void): JLong = { Instant.now().getMillis } })).apply(View.asSingleton[JLong]()) and then read it from a DoFn (using context.sideInput), is it guaranteed that: - every instance of the DoFn will read the same value? - The value will never change? If so it seems like it'd be safe to cache the value inside the DoFn. It seems like this would be the case, but I've also seen cases in dataflow where the UI indicates that the MapElements step above produced more than one element, so I'm curious what people have to say. Thanks!