Re: inconsistency found in DirectRunner API (arg should be _UnwindowedValues but is not)
Thanks. I've filed https://issues.apache.org/jira/browse/BEAM-11882 . If you want to take a stab at fixing it, you could try replacing the arguemnt passed to merge_accumulators at https://github.com/apache/beam/blob/release-2.28.0/sdks/python/apache_beam/transforms/combiners.py#L963 with a new object whose __iter__ method returns iter(accumulators) and create a pull request. On Wed, Feb 24, 2021 at 2:45 PM Stephen Dewey wrote: > Oh, I forgot to mention that I am using SDK 2.27.0 and Python 3.8 > > On Wed, Feb 24, 2021 at 5:27 PM Stephen Dewey > wrote: > >> Hi, I am reporting a minor bug. >> >> Based on this answer by Pablo: >> https://stackoverflow.com/a/42283279/783314 >> >> It appears that you want to always have an _UnwindowedValues in >> DirectRunner whenever it exists in DataflowRunner, to provide consistency >> between the two. >> >> What I have noticed is that if you subclass beam.CombineFn in Python, >> the accumulators received by the merge_accumulators method (as its >> argument) will be _UnwindowedValues in DataflowRunner, but not in >> DirectRunner. This leads to an error if somebody passes that value to, say, >> len(). The error will be: TypeError: object of type '_UnwindowedValues' >> has no len() >> >> Hope this helps! >> Stephen >> >
Beam College webinar series invitation
Hello Apache Beam Community, You are invited to Improve your data processing skills with the *Beam College* webinars! If you know about Apache Beam but haven’t used it in production yet, or you want to learn best practices to optimize your Beam pipelines, then Beam College is for you! Beam College, is a *free 5-day webinar series *designed to be flexible, so you can sign up and drop-in based on topics of your interest and needs. Don’t miss the opportunity to learn practical tips, experience interactive demos and engage with our Beam experts! Some of the topics we’ll cover: Introduction to the Data processing ecosystem Advanced distributed data processing with Apache Beam Features to scale and productionalize your business case Strategies for performance and cost optimization Best practices for debugging Beam pipelines Check out the full curriculum at: https://beamcollege.dev/all-courses/ -- Mara Ruvalcaba COO, SG Software Guru & Nearshore Link USA: 512 296 2884 MX: 55 5239 5502
Re: Details on Beam Jira Bot
Hi Konstantin, I don't think there's any documentation about it, but there was a discussion on dev@ [1]. Does that help? Brian [1] https://lists.apache.org/thread.html/rb51dfffbc8caf40efe7e1d137402438a05d0375fd945bda8fd7e33d2%40%3Cdev.beam.apache.org%3E On Fri, Feb 26, 2021 at 1:25 AM Konstantin Knauf wrote: > Dear Beam Community, > > I am looking for details about the rules that the Beam Jira Bot follows. > Are these documented somewhere or has there ever been a public discussion > on this? I was not able to find something in the wiki, mailing list or > website. Context: I am thinking about proposing something similar to the > Apache Flink Community. > > Thank you, > > Konstantin > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >
Re: Should we support VCF IO on Python 3?
Thank you for your reply. Considering opinions, it would be better to remove VCF IO from the codebase for the present. When removing from codebase, removing description from the document Ahmet commented at https://issues.apache.org/jira/browse/BEAM-5628 is also needed. On Wed, Feb 24, 2021 at 2:31 AM Cory McLean wrote: > > +1 to removing from the codebase, and if it becomes of interest again, > porting to cyvcf2. But most genomics workflows are not using Beam at the > moment. > > On Tue, Feb 23, 2021 at 1:12 AM Chamikara Jayalath > wrote: >> >> Given that we don't support Python 2 anymore, it sounds like this is just >> broken code and we cannot expect anybody to be using it (after Beam 2.24.0). >> If so +1 for removing it from the codebase. If we decide to add it back with >> Python3 support, we should be able to refer to (working) 2.24.0 >> implementation. >> >> Thanks, >> Cham >> >> On Mon, Feb 22, 2021 at 5:17 PM Valentyn Tymofieiev >> wrote: >>> >>> Hi Yoshiki, >>> >>> If switching the code to a new version of VCF package is something easy to >>> do, I would keep the code, but keep the dependency on vcf packages >>> optional, since we know that this code is not in use. If you decide to try >>> this route, https://issues.apache.org/jira/browse/BEAM-5628 mentions >>> cyvcf2 as a possible replacement. >>> >>> If replacement is not trivial and/or nobody is interested in making it >>> work, I would remove this IO. >>> >>> CC'ing a few folks who may have an opinion: +Chamikara Jayalath +Cory >>> McLean . >>> >>> Thanks for your help with the cleanup! >>> >>> On Sun, Feb 21, 2021 at 4:23 AM Yoshiki Obata >>> wrote: Hi all, I'm cleaning up Python 2 codepath now and find that VCF IO codes still remain though they might not work properly with latest Beam because they depend on PyVCF which does not support Python 3. According to comments in vcfio.py, migrating to Nucleus is expected, but it is concluded that the plan is not the right option at the comment of https://issues.apache.org/jira/browse/BEAM-5628 Now, it would be needed to decide which should we do for VCF IO - drop support or maintain support using another vcf package. Would anyone have a basis for the decision? Yoshiki
Details on Beam Jira Bot
Dear Beam Community, I am looking for details about the rules that the Beam Jira Bot follows. Are these documented somewhere or has there ever been a public discussion on this? I was not able to find something in the wiki, mailing list or website. Context: I am thinking about proposing something similar to the Apache Flink Community. Thank you, Konstantin -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk