Re: @DoFn.Setup not called

2017-11-21 Thread Jacob Marble
Cool! Thanks Kenn. Jacob On Mon, Nov 20, 2017 at 9:57 AM, Kenneth Knowles wrote: > I wanted to follow up that this has been reproduced and diagnosed, and a > fix is underway. The ticket to follow is https://issues.apache.org/ > jira/browse/BEAM-3219. > > Kenn > > On Fri, Nov 17, 2017 at 12:23 P

Re: @DoFn.Setup not called

2017-11-20 Thread Kenneth Knowles
I wanted to follow up that this has been reproduced and diagnosed, and a fix is underway. The ticket to follow is https://issues.apache.org/jira/browse/BEAM-3219. Kenn On Fri, Nov 17, 2017 at 12:23 PM, Jacob Marble wrote: > Here is a small pipeline job that fails using the Dataflow runner, but

Re: @DoFn.Setup not called

2017-11-20 Thread Kenneth Knowles
On Fri, Nov 17, 2017 at 8:38 PM, Jacob Marble wrote: > I also notice that stateful DoFn's seem to only be instantiated once in > Dataflow, but multiple instances do end up being created in the direct > runner. Is there a story behind that? > The runner is free to instantiate a DoFn as often as i

Re: @DoFn.Setup not called

2017-11-17 Thread Jacob Marble
I also notice that stateful DoFn's seem to only be instantiated once in Dataflow, but multiple instances do end up being created in the direct runner. Is there a story behind that? Jacob On Fri, Nov 17, 2017 at 7:22 PM, Jacob Marble wrote: > Noticing some related and unexpected differences betw

Re: @DoFn.Setup not called

2017-11-17 Thread Jacob Marble
Noticing some related and unexpected differences between batch and streaming pipelines. Why does a stateful DoFn behave like GroupByKey (no data output until all data input is complete) in a batch pipeline, but not in a streaming pipeline? It looks like BatchStatefulParDoOverrides has something to

Re: @DoFn.Setup not called

2017-11-17 Thread Jacob Marble
Here is a small pipeline job that fails using the Dataflow runner, but doesn't fail using the direct runner. https://gist.github.com/jacobmarble/804c2edb9c80a2863f3e671d6851a55f Jacob On Fri, Nov 17, 2017 at 9:27 AM, Kenneth Knowles wrote: > It is definitely a big deal if @Setup is not getting

Re: @DoFn.Setup not called

2017-11-17 Thread Kenneth Knowles
It is definitely a big deal if @Setup is not getting called! There are no special cases that would skip @Setup. Please do report what you can. That said, lazily doing setup (via null check or some such as you mention) is perfectly fine and often a more robust programming pattern. Upside: you can't

Re: @DoFn.Setup not called

2017-11-17 Thread Jacob Marble
I tried to write a simpler DoFn that induces the error, but it works fine. Working around the issue today by using @StartBundle with a null check, and that seems to be working. If this really is a big deal, then it needs to be reported, so I'll try to find time to write a broken example. Jacob O

Re: @DoFn.Setup not called

2017-11-16 Thread Eugene Kirpichov
Could you give more details, e.g. a code snippet that reproduces the issue, and describe how you determine that @Setup hasn't been called? On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu wrote: > ​I've been using DoFn.Setup method in Dataflow and it seems to be working > fine.​ > > On Thu, Nov 16,

Re: @DoFn.Setup not called

2017-11-16 Thread Derek Hao Hu
​I've been using DoFn.Setup method in Dataflow and it seems to be working fine.​ On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble wrote: > This one is weird. > > A DoFn I wrote: > - stateful > - used plenty in a streaming pipeline > - direct and dataflow runners > - works fine > > Now: > - new batc

@DoFn.Setup not called

2017-11-16 Thread Jacob Marble
This one is weird. A DoFn I wrote: - stateful - used plenty in a streaming pipeline - direct and dataflow runners - works fine Now: - new batch pipeline - @DoFn.Setup method not called - direct runner works properly (logs from setup method are output) - dataflow runner simply doesn't call the set