Sounds good - thank you. On Mon, Mar 30, 2020 at 11:54 AM Robert Bradshaw <[email protected]> wrote:
> On Mon, Mar 30, 2020 at 10:40 AM Joshua B. Harrison < > [email protected]> wrote: > >> Thank you for getting back to me. I would be happy to help contribute - >> has there been any discussion around this issue before? >> > > Udi has been pushing the type annotation work forward lately, though I > don't know that he's looked into the multi-output much, if at all. It'd be > great if you could contribute! > > >> At the least, I think it be preferable to raise a not implemented error >> in Python when encountering this case. >> > > Agreed. > > >> It seems like multi-input for CoGroupByKey is represented as a Union of >> all the component collection types. Would it make sense to do the same for >> the output types? Is this a better discussion for the dev group? >> > > +1 to taking this to the dev group. > > >> Thanks again for your time and help. >> >> Best, >> Joshua >> >> On Mon, Mar 30, 2020 at 11:22 AM Robert Bradshaw <[email protected]> >> wrote: >> >>> That is correct, type hints unfortunately are not yet supported for >>> multiple-output PTransforms. >>> >>> On Thu, Mar 26, 2020 at 10:05 PM Joshua B. Harrison < >>> [email protected]> wrote: >>> >>>> Hello all, >>>> >>>> I am working on adding type hints to my pipeline, and ran into an issue >>>> with PTransforms that produce multiple, tagged outputs. >>>> >>>> My class looks like this: >>>> >>>> @with_input_types(mytype.Data) >>>>> @with_output_types(mytype.KeyedData) >>>>> class DenormalizeData(ptransform.PTransform): >>>>> MAIN = 'denormalized' >>>>> SKIPPED = functions.DenormalizeData.SKIPPED >>>>> def expand(self, pcol: mytype.Data) -> mytype.KeyedPriceData: >>>>> return (pcol >>>>> | 'Denormalize PriceData' >> core.ParDo( >>>>> functions.DenormalizeData()).with_outputs( >>>>> self.SKIPPED, main=self.MAIN)) >>>> >>>> >>>> Where functions.DenormalizeData is a core.DoFn. From what I can tell, >>>> the type checking code here at >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L429 >>>> attempts >>>> to access the pvalue._element_type. But in this case, the pvalue is a >>>> DoOutputsTuple ( >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L239) >>>> which overrides __getattr__ to check for tag names. In this case, >>>> _element_type is not a valid tag, and I get the following partial stack: >>>> >>>> "apache_beam_2_17_0/apache_beam/transforms/ptransform.py", line 401, in >>>>> type_check_inputs_or_outputs >>>>> if pvalue_.element_type is None: >>>>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 241, in >>>>> __getattr__ >>>>> return self[tag] >>>>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 256, in >>>>> __getitem__ >>>>> tag, self._main_tag, self._tags)) >>>>> ValueError: Tag 'element_type' is neither the main tag 'denormalized' >>>>> nor any of the tags ('skipped',) >>>> >>>> >>>> Is my diagnoses correct? Is this a known issue? Can type hints exist on >>>> DoOutputsTuples? >>>> >>>> Thank you for your time and help. >>>> >>>> Best, >>>> Joshua >>>> >>>> -- >>>> Joshua Harrison | Software Engineer | [email protected] >>>> <[email protected]> | 404-433-0242 <(404)%20433-0242> >>>> >>> >> >> -- >> Joshua Harrison | Software Engineer | [email protected] >> <[email protected]> | 404-433-0242 <(404)%20433-0242> >> > -- Joshua Harrison | Software Engineer | [email protected] <[email protected]> | 404-433-0242
