Re: inconsistency found in DirectRunner API (arg should be _UnwindowedValues but is not)

2021-02-26 Thread Robert Bradshaw
Thanks. I've filed https://issues.apache.org/jira/browse/BEAM-11882 .

If you want to take a stab at fixing it, you could try replacing the
arguemnt passed to merge_accumulators at
https://github.com/apache/beam/blob/release-2.28.0/sdks/python/apache_beam/transforms/combiners.py#L963
with a new object whose __iter__ method returns iter(accumulators) and
create a pull request.


On Wed, Feb 24, 2021 at 2:45 PM Stephen Dewey 
wrote:

> Oh, I forgot to mention that I am using SDK 2.27.0 and Python 3.8
>
> On Wed, Feb 24, 2021 at 5:27 PM Stephen Dewey 
> wrote:
>
>> Hi, I am reporting a minor bug.
>>
>> Based on this answer by Pablo:
>> https://stackoverflow.com/a/42283279/783314
>>
>> It appears that you want to always have an _UnwindowedValues in
>> DirectRunner whenever it exists in DataflowRunner, to provide consistency
>> between the two.
>>
>> What I have noticed is that if you subclass beam.CombineFn in Python,
>> the accumulators received by the merge_accumulators method (as its
>> argument) will be _UnwindowedValues in DataflowRunner, but not in
>> DirectRunner. This leads to an error if somebody passes that value to, say,
>> len(). The error will be: TypeError: object of type '_UnwindowedValues'
>> has no len()
>>
>> Hope this helps!
>> Stephen
>>
>


Beam College webinar series invitation

2021-02-26 Thread Mara Ruvalcaba

Hello Apache Beam Community,

You are invited to Improve your data processing skills with the *Beam 
College* webinars!


If you know about Apache Beam but haven’t used it in production yet, or 
you want to learn best practices to optimize your Beam pipelines, then 
Beam College is for you!


Beam College, is a *free 5-day webinar series *designed to be flexible, 
so you can sign up and drop-in based on topics of your interest and 
needs. Don’t miss the opportunity to learn practical tips, experience 
interactive demos and engage with our Beam experts!


Some of the topics we’ll cover:

    Introduction to the Data processing ecosystem
    Advanced distributed data processing with Apache Beam
    Features to scale and productionalize your business case
    Strategies for performance and cost optimization
    Best practices for debugging Beam pipelines

Check out the full curriculum at: https://beamcollege.dev/all-courses/


--
Mara Ruvalcaba
COO, SG Software Guru & Nearshore Link
USA: 512 296 2884
MX: 55 5239 5502



Re: Details on Beam Jira Bot

2021-02-26 Thread Brian Hulette
Hi Konstantin,

I don't think there's any documentation about it, but there was a
discussion on dev@ [1]. Does that help?

Brian

[1]
https://lists.apache.org/thread.html/rb51dfffbc8caf40efe7e1d137402438a05d0375fd945bda8fd7e33d2%40%3Cdev.beam.apache.org%3E

On Fri, Feb 26, 2021 at 1:25 AM Konstantin Knauf  wrote:

> Dear Beam Community,
>
> I am looking for details about the rules that the Beam Jira Bot follows.
> Are these documented somewhere or has there ever been a public discussion
> on this? I was not able to find something in the wiki, mailing list or
> website. Context: I am thinking about proposing something similar to the
> Apache Flink Community.
>
> Thank you,
>
> Konstantin
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


Re: Should we support VCF IO on Python 3?

2021-02-26 Thread Yoshiki Obata
Thank you for your reply.

Considering opinions, it would be better to remove VCF IO from the
codebase for the present.
When removing from codebase, removing description from the document
Ahmet commented at https://issues.apache.org/jira/browse/BEAM-5628 is
also needed.

On Wed, Feb 24, 2021 at 2:31 AM Cory McLean  wrote:
>
> +1 to removing from the codebase, and if it becomes of interest again, 
> porting to cyvcf2. But most genomics workflows are not using Beam at the 
> moment.
>
> On Tue, Feb 23, 2021 at 1:12 AM Chamikara Jayalath  
> wrote:
>>
>> Given that we don't support Python 2 anymore, it sounds like this is just 
>> broken code and we cannot expect anybody to be using it (after Beam 2.24.0).
>> If so +1 for removing it from the codebase. If we decide to add it back with 
>> Python3 support, we should be able to refer to (working) 2.24.0 
>> implementation.
>>
>> Thanks,
>> Cham
>>
>> On Mon, Feb 22, 2021 at 5:17 PM Valentyn Tymofieiev  
>> wrote:
>>>
>>> Hi Yoshiki,
>>>
>>> If switching the code to a new version of VCF package is something easy to 
>>> do, I would keep the code, but keep the dependency on vcf packages 
>>> optional, since we know that this code is not in use.  If you decide to try 
>>> this route,  https://issues.apache.org/jira/browse/BEAM-5628 mentions 
>>> cyvcf2 as a possible replacement.
>>>
>>> If replacement is not trivial and/or nobody is interested in making it 
>>> work, I would remove this IO.
>>>
>>> CC'ing a few folks who may have an opinion: +Chamikara Jayalath +Cory 
>>> McLean .
>>>
>>> Thanks for your help with the cleanup!
>>>
>>> On Sun, Feb 21, 2021 at 4:23 AM Yoshiki Obata  
>>> wrote:

 Hi all,

 I'm cleaning up Python 2 codepath now and find that VCF IO codes still
 remain though they might not work properly with latest Beam because
 they depend on PyVCF which does not support Python 3.
 According to comments in vcfio.py, migrating to Nucleus is expected,
 but it is concluded that the plan is not the right option at the
 comment of https://issues.apache.org/jira/browse/BEAM-5628

 Now, it would be needed to decide which should we do for VCF IO - drop
 support or maintain support using another vcf package.
 Would anyone have a basis for the decision?

 Yoshiki


Details on Beam Jira Bot

2021-02-26 Thread Konstantin Knauf
Dear Beam Community,

I am looking for details about the rules that the Beam Jira Bot follows.
Are these documented somewhere or has there ever been a public discussion
on this? I was not able to find something in the wiki, mailing list or
website. Context: I am thinking about proposing something similar to the
Apache Flink Community.

Thank you,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk