Re: [PROPOSAL] Make PBegin and PDone public in the Python SDK

2020-07-15 Thread Luke Cwik
With splittable DoFns, we should be aiming to have 'sources' take PCollections for input as the default implementation. The common 'Read' with no inputs will still exist and it would make sense to have PBegin. We have found that most sinks actually could have meaningful output (e.g. how many recor

Re: [PROPOSAL] Make PBegin and PDone public in the Python SDK

2020-07-14 Thread Robert Bradshaw
SGTM. On Tue, Jul 14, 2020 at 5:28 PM Udi Meiri wrote: > > So it sounds like we should: > - Make PBegin public > - Deprecate PDone return type in favor of None > - Update the programming guide's Composite Transforms section. > > > On Tue, Jul 14, 2020 at 5:13 PM Robert Burke wrote: >> >> For con

Re: [PROPOSAL] Make PBegin and PDone public in the Python SDK

2020-07-14 Thread Robert Bradshaw
PBegin is somewhat analogous to Go's Pipeline.Root() scope, though Go does (Composite and otherwise) transforms quite differently. On Tue, Jul 14, 2020 at 5:13 PM Robert Burke wrote: > > For contrast, the Go SDK provides an Impulse transform directly (analogous to > PBegin, part of the model) an

Re: [PROPOSAL] Make PBegin and PDone public in the Python SDK

2020-07-14 Thread Udi Meiri
So it sounds like we should: - Make PBegin public - Deprecate PDone return type in favor of None - Update the programming guide's Composite Transforms section. On Tue, Jul 14, 2020 at 5:13 PM Robert Burke wrote: > For contrast, the Go SDK provides an Impulse transform directly (analogous > to P

Re: [PROPOSAL] Make PBegin and PDone public in the Python SDK

2020-07-14 Thread Robert Burke
For contrast, the Go SDK provides an Impulse transform directly (analogous to PBegin, part of the model) and has a ParDo0 (which like PDone has no output Pcollections). The numeral suffixing the go ParDo functions indicate the number of Output Pcollections are expected from the passed in DoFm. On

Re: [PROPOSAL] Make PBegin and PDone public in the Python SDK

2020-07-14 Thread Robert Bradshaw
Yes, PBegin and PDone are used in the SDKs, but are not part of the model. I would be supportive of making PBegin more public to denote that a transform is a "root" of the pipeline. PDone was required for Java, however I don't think there's any use for it in the Python SDK (a transform can simply

[PROPOSAL] Make PBegin and PDone public in the Python SDK

2020-07-13 Thread Udi Meiri
Details: One item of interest that came up during the implementation of BEAM-10258 [1] is how to treat PTransforms that act like sources or sinks. These transforms have either no input or output PCollections, respectively. Internally, we use PBegin and PDone to denote this. (ex: [2]) IIUC, PBegin