Re: Using side inputs in any user code via thread-local side input accessor

2017-10-13 Thread Eugene Kirpichov
The PR has been merged, and now Beam at HEAD supports side inputs in MapElements, FlatMapElements, Watch, and the machinery can be used in more transforms. Next step is to proceed with FileIO.write(). On Thu, Oct 5, 2017 at 11:09 AM Kenneth Knowles wrote: > Sorry for the delayed reply. This may

Re: Using side inputs in any user code via thread-local side input accessor

2017-10-05 Thread Kenneth Knowles
Sorry for the delayed reply. This may be a non-issue, but my overarching comment was to address how (if at all) this relates to the portable model of a pipeline. One easy way to avoid violating this is to wait until https://github.com/apache/beam/pull/3938 is completed. This includes a portability

Re: Using side inputs in any user code via thread-local side input accessor

2017-10-04 Thread Eugene Kirpichov
A bunch of people have commented on the doc, without it seems any major disagreement. The PR is out for review. On Fri, Sep 29, 2017 at 1:53 PM Eugene Kirpichov wrote: > Hi all, > > Please take a look at some notes from a discussion we had about this with > a few folks, and an updated proposal a

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-29 Thread Eugene Kirpichov
Hi all, Please take a look at some notes from a discussion we had about this with a few folks, and an updated proposal and a couple of demo PRs implementing the proposal. http://s.apache.org/context-fn I hope this proposal is more agreeable. On Wed, Sep 13, 2017 at 3:46 PM Kenneth Knowles wrote

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Kenneth Knowles
ValueProvider is global, PCollectionView is per-window, state is per-step/key/window, etc. So my unhappiness increases as we move through that list, adding more and more constraints on correct use, none of which are reflected in the API. Your description of "its context is an execution of the pipe

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Eugene Kirpichov
Thanks! I think most of the issues you point out [validation, scheduling, prefetching] are in the area of wiring. I reiterate that they can be solved - both of the methods below will give the runner an answer to the low-level question "which DoFn will need which side inputs": 1) Providing withSid

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Robert Bradshaw
On Wed, Sep 13, 2017 at 1:56 PM, Eugene Kirpichov wrote: > On Wed, Sep 13, 2017 at 1:44 PM Robert Bradshaw > wrote: > >> On Wed, Sep 13, 2017 at 1:17 PM, Eugene Kirpichov >> wrote: >> > Hi Robert, >> > >> > Given the anticipated usage of this proposal in Java, I'm not sure the >> > Python approa

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Kenneth Knowles
I made some comments on https://issues.apache.org/jira/browse/BEAM-2950 which was filed to do a similar thing for State. Luke correctly pointed out that many of the points apply here as well. I said most of the same above, but I thought I'd pull them out again from that ticket and rephrase to apply

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Eugene Kirpichov
On Wed, Sep 13, 2017 at 1:44 PM Robert Bradshaw wrote: > On Wed, Sep 13, 2017 at 1:17 PM, Eugene Kirpichov > wrote: > > Hi Robert, > > > > Given the anticipated usage of this proposal in Java, I'm not sure the > > Python approach you quoted is the right one. > > Perhaps not, but does that mean i

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Robert Bradshaw
On Wed, Sep 13, 2017 at 1:17 PM, Eugene Kirpichov wrote: > Hi Robert, > > Given the anticipated usage of this proposal in Java, I'm not sure the > Python approach you quoted is the right one. Perhaps not, but does that mean it would be a Java-ism only or would we implement it in Python despite it

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Eugene Kirpichov
Hi Robert, Given the anticipated usage of this proposal in Java, I'm not sure the Python approach you quoted is the right one. The main reason: I see how it works with Map/FlatMap, but what about cases like FileIO.write(), parameterized by several lambdas (element -> destination, destination -> f

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Robert Bradshaw
+1 to reducing the amount of boilerplate for dealing with side inputs. I prefer the "NewDoFn" style of side inputs for consistency. The primary drawback seems to be lambda's incompatibility with annotations. This is solved in Python by letting all the first annotated argument of the process method

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Reuven Lax
On Wed, Sep 13, 2017 at 10:05 AM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hi, > > I agree with these concerns to an extent, however I think the advantage of > transparently letting any user code access side inputs, especially > including lambdas, is so great that we should find a

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Lukasz Cwik
Initially I was skeptical of the change but after seeing how many "context" objects were being created to solve issues of passing around references really showed that our API approach was problematic. Using this pattern allows us to get rid of things like CombineWithContext and the *new* SideInputA

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Eugene Kirpichov
Hi, I agree with these concerns to an extent, however I think the advantage of transparently letting any user code access side inputs, especially including lambdas, is so great that we should find a way to address these concerns within the constraints of the pattern I'm proposing. See more below.

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Ben Chambers
One possible issue with this is that updating a thread local is likely to be much more expensive than passing an additional argument. Also, not all code called from within the DoFn will necessarily be in the same thread (eg., sometimes we create a pool of threads for doing work). It may be *more* c

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-06 Thread Eugene Kirpichov
Hi, On Wed, Sep 6, 2017 at 10:55 AM Kenneth Knowles wrote: > On Wed, Sep 6, 2017 at 8:15 AM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > > > > > The differences are: > > - The proposal in the doc allows wiring different side inputs to the same > > Supplier, but I'm not convinced

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-06 Thread Kenneth Knowles
On Wed, Sep 6, 2017 at 8:15 AM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > > The differences are: > - The proposal in the doc allows wiring different side inputs to the same > Supplier, but I'm not convinced that this is important - you can just as > easily call the constructor of y

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-06 Thread Eugene Kirpichov
Hi Luke, I think my proposal is very similar in style to the one in A New DoFn , so I disagree that it is introducing a divergence from that discussion. That code proposes: class MyNewDoFn

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-06 Thread Lukasz Cwik
My concern with the proposal is not the specifics of how it will work and more about it being yet another way on how our API is to be used even though we have a proposal [1] of an API style we were working towards in Java and Python. I would rather re-open that discussion now about what we want tha

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-06 Thread Kenneth Knowles
+0.75 because I'd like to bring up invalid pipelines. I had proposed side inputs as parameters to DoFn in https://s.apache.org/a-new-dofn (specifically at [1]) so the only place they are specified is in the graph construction, making the DoFn more reusable and errors impossible. I've actually been

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-05 Thread Eugene Kirpichov
Hm, I guess you're right - for outputs it could be indeed quite valuable to output to them without plumbing (e.g. outputting errors). Could be done perhaps via TupleTag.output()? (assuming the same TupleTag can not be reused to tag multiple PCollection's) For now I sent a PR for side input support

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-05 Thread Lukasz Cwik
I disagree, state may not care where it is used as well since a person may call a function which needs to store/retrieve state and instead of having the DoFn declare the StateSpec and then pass in the state implementation down into the function everywhere. Similarly for outputs, the internal functi

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-05 Thread Eugene Kirpichov
Hm, I think of these things (state, side outputs etc.), only side inputs make sense to access in arbitrary user callbacks without explicit knowledge of the surrounding transform - so only side inputs can be implicit like this. Ultimately we'll probably end up removing ProcessContext, and keeping o

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-05 Thread Lukasz Cwik
For API consistency reasons, it would be good if we did this holistically and expanded this approach to state, side outputs, ... so that a person can always call Something.get() to return something that they can access implementation wise. It will be confusing for our users to have many variations

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-05 Thread Eugene Kirpichov
Also, I think my approach is compatible with annotations and future removal of .withSideInputs if we annotate a field: final PCollectionView foo = ...; class MyDoFn { @SideInput PCollectionView foo = foo; ...foo.get()... } We can extract the accessed views from the DoFn instance using refl

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-05 Thread Eugene Kirpichov
Hi Luke, I know this (annotations) is the pattern we were considering for side inputs, but I no longer think it is the best way to access them. Annotations help getting rid of the .withSideInputs() call, but this is where their advantage ends. The advantages of the proposed approach are that it a

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-05 Thread Reuven Lax
Eugene's original email mentioned the use of lambdas, and I'm not sure whether the parameter type-annotation will work in the lambda case. I do think that type annotations are better in the other examples. BTW, why use the annotation just for validation? If side inputs are specified via parameter

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-05 Thread Lukasz Cwik
I believe we should follow the pattern that state uses and add a type annotation to link the side input definition to its usage directly. This would allow us to know that the side input was definitely being accessed and perform validation during graph construction for any used but unspecified side

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-04 Thread Eugene Kirpichov
Sure, here's how a modified (passing) MapElements unit test looks like, with usage of side inputs: @Test @Category(NeedsRunner.class) public void testMapBasicWithSideInput() throws Exception { * final PCollectionView foo =* *pipeline.apply("foo", Create.of("foo")).apply(View.asSin

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-04 Thread Reuven Lax
Can you provide a code snippet showing how this would look? On Sun, Sep 3, 2017 at 6:49 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > TL;DR Introduce method PCollectionView.get(), implemented as: get > thread-local ProcessContext and do c.sideInput(this). As a result, any user > l