That's correct. DoFns are serialized in the pipeline description and
shipped to workers and deserialized there. Standard Java serialization is
used, and Java serialization doesn't call the constructor - it directly
creates an instance of the class (even if it doesn't declare a default
constructor) and repopulates fields.

On Sun, Nov 19, 2017, 7:07 PM Jacob Marble <jmar...@kochava.com> wrote:

> Eugene, that worked. Can you explain why this doesn't work when I set the
> UUID (or random value) from the constructor?
>
> It looks like the DoFn constructor is called once by the worker, then that
> constructed object is copied as many times as needed, each instance getting
> it's own thread and @Setup,@StartBundle,@etc loop. Is that correct?
>
> Thanks for the help.
>
> Jacob
>
> On Sun, Nov 19, 2017 at 10:24 AM, Eugene Kirpichov <kirpic...@google.com>
> wrote:
>
>> You could create a private variable with a UUID, filled in in @Setup or
>> (if you're hitting that bug where @Setup wasn't being called) in
>> readObject()?
>>
>> On Sun, Nov 19, 2017 at 8:17 AM Jacob Marble <jmar...@kochava.com> wrote:
>>
>>> Is there a recommended way to get a unique id for each instance of a
>>> DoFn?
>>>
>>> - DataflowWorkerHarnessOptions.getWorkerId() only returns a unique id
>>> per worker, which can contain multiple instances of a DoFn.
>>> - Looks like ThreadLocalRandom is seeded with the same value on every
>>> instance
>>> - Thinking I'll try workerId + construction timestamp next
>>>
>>> Jacob
>>>
>>
>

Reply via email to