I'm interested in this area too. One limitation I guess is that this assumes your runner is going to be single JVM if you need your singletons to be globally unique. I'm mostly using DirectRunner (I'm still new to all this) for which this holds. I suppose for more distributed runners this would be a more challenging problem.
One tip I would give for your code is protect your singleton return values. For example, the Set<Integer> that you return from getOrCreateSingletonAllowedCities. If you want that to be modifiable, you should wrap it using something like Collections.synchronizedSet(). If you want it to be immutable, use unmodifiableSet(). Note that even if the general problem of making these singletons globally available is solved by the framework, you will still need to make your singletons thread-safe. --Cam On Thu, Apr 30, 2020 at 12:45 PM Jeff Klukas <[email protected]> wrote: > Beam Java users, > > I've run into a few cases where I want to present a single thread-safe > data structure to all threads on a worker, and I end up writing a good bit > of custom code each time involving a synchronized method that handles > creating the resource exactly once, and then each thread has its own > reference to the singleton. I don't have extensive experience with thread > safety in Java, so it seems likely I'm going to get this wrong. > > Are there any best practices for state that is shared across threads? Any > prior art I can read up on? > > The most concrete case I have in mind is loading a GeoIP database for > doing city lookups from IP addresses. We're using MaxMind's API which > allows mapping a portion of memory to a file sitting on disk. We have a > synchronized method that checks if the reader has been initialized [0] ; if > not, we copy the database file from GCS to local disk, build the > DatabaseReader instance, and return it. Other threads will see the > already-initialized and just get a reference to it instead. > > This all appears to work, and it saves memory compared to each thread > maintaining their own DatabaseReader. But is there a safer or more built-in > way to do this? Am I missing relevant hooks in the Beam API that would make > this cleaner? > > [0] > https://github.com/mozilla/gcp-ingestion/blob/master/ingestion-beam/src/main/java/com/mozilla/telemetry/decoder/GeoCityLookup.java#L95 >
