Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-04-01 Thread Sam Rohde
On Wed, Apr 1, 2020 at 3:34 PM Robert Bradshaw wrote: > E.g. something like https://github.com/apache/beam/pull/11283 > > That looks good. Though it may be better to do the change under the "passthrough_pcollection_output_ids" experiment flag above it (I'll comment on this in the PR too). > On

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-04-01 Thread Robert Bradshaw
E.g. something like https://github.com/apache/beam/pull/11283 On Wed, Apr 1, 2020 at 2:57 PM Robert Bradshaw wrote: > On Wed, Apr 1, 2020 at 1:48 PM Sam Rohde wrote: > >> To restate the original issue it is that the current method of setting >> the output tags on PCollections from composites

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-04-01 Thread Robert Bradshaw
On Wed, Apr 1, 2020 at 1:48 PM Sam Rohde wrote: > To restate the original issue it is that the current method of setting the > output tags on PCollections from composites drops the tag information of > the returned PCollections. > Composite PTransforms should *not* be setting output tags on

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-04-01 Thread Sam Rohde
To restate the original issue it is that the current method of setting the output tags on PCollections from composites drops the tag information of the returned PCollections. So an expand returning a dict will have its outputs labeled as None, 1, ..., len(outputs). This is broken because embedded

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-04-01 Thread Robert Bradshaw
I'm -1 on this, it adds additional restrictions and I don't see what this buys us. (In particular, it doesn't address the original issue.) On Wed, Apr 1, 2020 at 12:41 PM Sam Rohde wrote: > So then how does the proposal sound? > > Writing again here: > PTransform.expand: (...) -> Union[PValue,

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 31, 2020 at 4:13 PM Luke Cwik wrote: > > It is important that composites know how things are named so that any > embedded payloads in the composite PTransform can reference the outputs > appropriately. Very good point. This is part of the cleanup to treat inputs and outputs of

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Luke Cwik
It is important that composites know how things are named so that any embedded payloads in the composite PTransform can reference the outputs appropriately. On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw wrote: > On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde wrote: > >>> > >>> * Don't allow

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde wrote: >>> >>> * Don't allow arbitrary nestings returned during expansion, force composite >>> transforms to always provide an unambiguous name (either a tuple with >>> PCollections with unique tags or a dictionary with untagged PCollections or >>> a

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 24, 2020 at 1:07 PM Sam Rohde wrote: > > Hi All, > > Problem > I would like to discuss BEAM-9322 and the correct way to set the output tags > of a transform with nested PCollections, e.g. a dict of PCollections, a tuple > of dicts of PCollections. Before the fixing of BEAM-1833, the

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Sam Rohde
> > * Don't allow arbitrary nestings returned during expansion, force >> composite transforms to always provide an unambiguous name (either a tuple >> with PCollections with unique tags or a dictionary with untagged >> PCollections or a singular PCollection (Java and Go SDKs do this)). >> > > I

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-26 Thread Udi Meiri
On Thu, Mar 26, 2020 at 10:13 AM Luke Cwik wrote: > The issue seems to be that a PCollection can have a "tag" associated with > it and PTransform expansion can return an arbitrary nested dictionary/tuple > yet we need to figure out what the user wanted as the local name for the > PCollection

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-26 Thread Luke Cwik
The issue seems to be that a PCollection can have a "tag" associated with it and PTransform expansion can return an arbitrary nested dictionary/tuple yet we need to figure out what the user wanted as the local name for the PCollection from all this information. Will this break people who rely on

[BEAM-9322] Python SDK discussion on correct output tag names

2020-03-24 Thread Sam Rohde
Hi All, *Problem* I would like to discuss BEAM-9322 and the correct way to set the output tags of a transform with nested PCollections, e.g. a dict of PCollections, a tuple of dicts of PCollections. Before the fixing of BEAM-1833