Thanks guys for your quick reply! I've just realized I've made a stupid bug in my double checked locking implementation. :)
Things appear to be working fine. It seems there is no strange thing related to static variables right now. Thanks! Derek On Sat, Oct 7, 2017 at 4:07 PM, Kevin Peterson <[email protected]> wrote: > I've had good luck in a similar scenario using a static instance of > Guava's loading cache, and fetching from GCS inside the load function. > > On Oct 7, 2017 3:52 PM, "Eugene Kirpichov" <[email protected]> wrote: > > Hi, > I'm not sure what you mean by this: "But they are non-serializable so I > can't just create a static constructor and create it while starting the > pipeline." > > You can definitely use static variables in DoFn's, same way as you can use > them in any other Java code. I'm not sure how serializability is an issue > here, because Java serialization doesn't serialize static variables - you > serialize object instances, and static variables do not belong to the > object instance (of course, unless you're explicitly holding a reference to > the static variable through your instance). Did you hit a > NotSerializableException? Can you show your code and/or try running with > the JVM flag -Dsun.io.serialization.extendedDebugInfo=true ? > > You need to be very careful with thread safety though - indeed there will > be multiple threads running a given DoFn on a given worker (these will be > different *instances* of the same DoFn class, but they'll of course still > concurrently access the static variables). > > On Sat, Oct 7, 2017 at 3:44 PM Derek Hao Hu <[email protected]> > wrote: > >> Hi, >> >> I'm looking for ways to use a static variable in a DoFn. The background >> during run-time I need to construct some non-serializable (but expensive) >> variables from some binary blobs downloaded from GCS buckets. >> >> The fact that the construction of these models are expensive makes me >> feel I should try to make them static, or at least static to each worker. >> But they are non-serializable so I can't just create a static constructor >> and create it while starting the pipeline. >> >> Originally I thought DoFn.Setup is what I need but after trying it seems >> DoFn.Setup would be executed per thread instead of per worker. Is there >> anything we can use so we can create something that is shared by multiple >> threads? >> >> Thanks, >> -- >> Derek Hao Hu >> >> Software Engineer | Snapchat >> Snap Inc. >> > > -- Derek Hao Hu Software Engineer | Snapchat Snap Inc.
