That's correct. DoFns are serialized in the pipeline description and shipped to workers and deserialized there. Standard Java serialization is used, and Java serialization doesn't call the constructor - it directly creates an instance of the class (even if it doesn't declare a default constructor) and repopulates fields.
On Sun, Nov 19, 2017, 7:07 PM Jacob Marble <jmar...@kochava.com> wrote: > Eugene, that worked. Can you explain why this doesn't work when I set the > UUID (or random value) from the constructor? > > It looks like the DoFn constructor is called once by the worker, then that > constructed object is copied as many times as needed, each instance getting > it's own thread and @Setup,@StartBundle,@etc loop. Is that correct? > > Thanks for the help. > > Jacob > > On Sun, Nov 19, 2017 at 10:24 AM, Eugene Kirpichov <kirpic...@google.com> > wrote: > >> You could create a private variable with a UUID, filled in in @Setup or >> (if you're hitting that bug where @Setup wasn't being called) in >> readObject()? >> >> On Sun, Nov 19, 2017 at 8:17 AM Jacob Marble <jmar...@kochava.com> wrote: >> >>> Is there a recommended way to get a unique id for each instance of a >>> DoFn? >>> >>> - DataflowWorkerHarnessOptions.getWorkerId() only returns a unique id >>> per worker, which can contain multiple instances of a DoFn. >>> - Looks like ThreadLocalRandom is seeded with the same value on every >>> instance >>> - Thinking I'll try workerId + construction timestamp next >>> >>> Jacob >>> >> >