It means that your DoFn class doesn't need to be thread safe, because when a runner wants to run it in multiple threads, it will create multiple copies of your DoFn.
On Mon, Oct 16, 2017, 10:27 AM Derek Hao Hu <[email protected]> wrote: > Hi Eugene, > > I'm not sure I understand what you mean - can you explain a bit more about > "an individual instance will be accessed only serially but not > concurrently"? > > Thanks, > > Derek > > On Mon, Oct 16, 2017 at 8:50 AM, Eugene Kirpichov <[email protected]> > wrote: > >> A worker can execute several instances of the same DoFn at the same time. >> They will be clones of the original DoFn specified in the pipeline and an >> individual instance will be accessed only serially but not concurrently. >> >> On Mon, Oct 16, 2017, 8:38 AM Jacob Marble <[email protected]> wrote: >> >>> Perfect, thanks. >>> >>> Jacob >>> >>> On Sun, Oct 15, 2017 at 11:43 PM, Jean-Baptiste Onofré <[email protected]> >>> wrote: >>> >>>> Yes, no problem at all. I meant that the DoFn is "attached" to a >>>> pipeline. >>>> >>>> Regards >>>> JB >>>> >>>> On 10/16/2017 08:25 AM, Derek Hao Hu wrote: >>>> >>>>> I believe a worker can execute multiple instances (i.e. threads) of a >>>>> DoFn. >>>>> >>>>> Derek >>>>> >>>>> On Sun, Oct 15, 2017 at 10:46 PM, Jean-Baptiste Onofré < >>>>> [email protected] <mailto:[email protected]>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Correct, @setup is used when bootstrapping the DoFn, @StartBundle >>>>> is called >>>>> for a set of data (bundle), @ProcessElement is for each element in >>>>> the >>>>> bundle/collection, @FinishBundle at the end of the dataset >>>>> (bundle), >>>>> @Teardown is called when the DoFn is "removed". >>>>> >>>>> A DoFn is per pipeline. >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> >>>>> On 10/16/2017 07:31 AM, Jacob Marble wrote: >>>>> >>>>> (there might be documentation on this that I didn't find; if >>>>> so a link >>>>> is sufficient) >>>>> >>>>> Good evening, this is just a check on my understanding. It >>>>> looks like an >>>>> instance of a given DoFn goes through this lifecycle. Am I >>>>> correct? >>>>> >>>>> - constructor >>>>> - @Setup (once) >>>>> - @StartBundle (zero to many times) >>>>> - @ProcessContext (zero to many times) >>>>> - @FinishBundle >>>>> - @Teardown (once) >>>>> >>>>> Can any of these steps be called concurrently? (I believe no) >>>>> Can one worker execute multiple instances of a DoFn? (I >>>>> believe yes) >>>>> >>>>> Thank you, >>>>> >>>>> Jacob >>>>> >>>>> >>>>> -- Jean-Baptiste Onofré >>>>> [email protected] <mailto:[email protected]> >>>>> http://blog.nanthrax.net >>>>> Talend - http://www.talend.com >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Derek Hao Hu >>>>> >>>>> Software Engineer | Snapchat >>>>> Snap Inc. >>>>> >>>> >>>> -- >>>> Jean-Baptiste Onofré >>>> [email protected] >>>> http://blog.nanthrax.net >>>> Talend - http://www.talend.com >>>> >>> >>> > > > -- > Derek Hao Hu > > Software Engineer | Snapchat > Snap Inc. >
