I've had good luck in a similar scenario using a static instance of Guava's
loading cache, and fetching from GCS inside the load function.

On Oct 7, 2017 3:52 PM, "Eugene Kirpichov" <[email protected]> wrote:

Hi,
I'm not sure what you mean by this: "But they are non-serializable so I
can't just create a static constructor and create it while starting the
pipeline."

You can definitely use static variables in DoFn's, same way as you can use
them in any other Java code. I'm not sure how serializability is an issue
here, because Java serialization doesn't serialize static variables - you
serialize object instances, and static variables do not belong to the
object instance (of course, unless you're explicitly holding a reference to
the static variable through your instance). Did you hit a
NotSerializableException? Can you show your code and/or try running with
the JVM flag -Dsun.io.serialization.extendedDebugInfo=true ?

You need to be very careful with thread safety though - indeed there will
be multiple threads running a given DoFn on a given worker (these will be
different *instances* of the same DoFn class, but they'll of course still
concurrently access the static variables).

On Sat, Oct 7, 2017 at 3:44 PM Derek Hao Hu <[email protected]> wrote:

> Hi,
>
> I'm looking for ways to use a static variable in a DoFn. The background
> during run-time I need to construct some non-serializable (but expensive)
> variables from some binary blobs downloaded from GCS buckets.​
>
> The fact that the construction of these models are expensive makes me feel
> I should try to make them static, or at least static to each worker. But
> they are non-serializable so I can't just create a static constructor and
> create it while starting the pipeline.
>
> Originally I thought DoFn.Setup is what I need but after trying it seems
> DoFn.Setup would be executed per thread instead of per worker. Is there
> anything we can use so we can create something that is shared by multiple
> threads?
>
> Thanks,
> --
> Derek Hao Hu
>
> Software Engineer | Snapchat
> Snap Inc.
>

Reply via email to