Re: [DISCUSS] Migrate Jira to GitHub Issues?

2022-02-14 Thread Jean-Baptiste Onofré
Hi,

I don't have concerns: if Beam is OK with the issue single milestone use,
that's fine with me ;)

Thanks for the detailed document, it helps!

Regards
JB

On Tue, Feb 15, 2022 at 6:52 AM Aizhamal Nurmamat kyzy 
wrote:

> Very humbly, I think the benefits of moving to GitHub Issues outweigh the
> shortcomings.
>
> Jan, Kenn, Alexey, JB: adding you directly as you had some concerns.
> Please, let us know if they were addressed by the options that we described
> in the doc [1]?
>
> If noone objects, I can start working with some of you on Migration TODOs
> outlined in the doc I am referencing.
>
>
> [1]
> https://docs.google.com/document/d/1_n7gboVbSKPs-CVcHzADgg8qpNL9igiHqUPCmiOslf0/edit#bookmark=id.izn35w5gsjft
>
> On Thu, Feb 10, 2022 at 1:12 PM Danny McCormick 
> wrote:
>
>> I'm definitely +1 on moving to help make the bar for entry lower for new
>> contributors (like myself!)
>>
>> Thanks,
>> Danny
>>
>> On Thu, Feb 10, 2022 at 2:32 PM Aizhamal Nurmamat kyzy <
>> aizha...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> I think we've had a chance to discuss shortcomings and advantages. I
>>> think each person may have a different bias / preference. My bias is to
>>> move to Github, to have a more inclusive, approachable project despite the
>>> differences in workflow. So I'm +1 on moving.
>>>
>>> Could others share their bias? Don't think of this as a vote, but I'd
>>> like to get a sense of people's preferences, to see if there's a
>>> strong/slight feeling either way.
>>>
>>> Again, the sticky points are summarized here [1], feel free to add to
>>> the doc.
>>>
>>> [1]
>>> https://docs.google.com/document/d/1_n7gboVbSKPs-CVcHzADgg8qpNL9igiHqUPCmiOslf0/edit#
>>>
>>>
>>> On Mon, Jan 31, 2022 at 7:23 PM Aizhamal Nurmamat kyzy <
>>> aizha...@apache.org> wrote:
>>>
 Welcome to the Beam community, Danny!

 We would love your help if/when we end up migrating.

 Please add your comments to the doc I shared[1], in case we missed some
 cool GH features that could be helpful. Thanks!

 [1]
 https://docs.google.com/document/d/1_n7gboVbSKPs-CVcHzADgg8qpNL9igiHqUPCmiOslf0/edit#

 On Mon, Jan 31, 2022, 10:06 AM Danny McCormick <
 dannymccorm...@google.com> wrote:

> > Then (this is something you'd have to code) you could easily write
> or use an existing GithubAction or bot that will assign the labels based 
> on
> the initial selection done by the user at entry. We have not done it yet
> but we might.
>
> Hey, new contributor here - wanted to chime in with a shameless plug
> because I happen to have written an action that does pretty much exactly
> what you're describing[1] and could be extensible to the use case 
> discussed
> here - it should basically just require writing some config (example in
> action[2]). In general, automated management of labels based on the 
> initial
> issue description + content isn't too hard, it does get significantly
> trickier (but definitely still possible) if you try to automate labels
> based on responses or edits.
>
> Also, big +1 that the easy integration with Actions is a significant
> advantage of using issues since it helps keep your automations in one 
> place
> (or at least fewer places) and gives you a lot of tools out of the box 
> both
> from the community and from the Actions org. *Disclaimer:* I am
> definitely biased. Until 3 weeks ago I was working on the Actions team at
> GitHub.
>
> I'd be happy to help with some of the issue automation if we decide
> that would be helpful, whether that's reusing existing work or tailoring 
> it
> more exactly to the Beam use case.
>
> [1] https://github.com/damccorm/tag-ur-it
> [2] https://github.com/microsoft/azure-pipelines-tasks/issues/15839
>
> Thanks,
> Danny
>
> On Mon, Jan 31, 2022 at 12:49 PM Zachary Houfek 
> wrote:
>
>> > You can link PR to the issue by just mentioning #Issue in the
>> commit message. If you do not prefix it with "Closes:" "Fixes:" or 
>> similar
>> it will be just linked
>>
>> Ok, thanks for the clarification there.
>>
>> Regards,
>> Zach
>>
>> On Mon, Jan 31, 2022 at 12:43 PM Cristian Constantinescu <
>> zei...@gmail.com> wrote:
>>
>>> I've been semi-following this thread, apologies if this has been
>>> raised already.
>>>
>>> From a user point of view, in some corporate environments (that I've
>>> worked at), Github is blocked. That includes the issues part. The Apache
>>> Jira is not blocked and does at times provide value while investigating
>>> issues.
>>>
>>> Obviously, users stuck in those unfortunate circonstances can just
>>> use their personal device. Not advocating any direction on the matter, 
>>> just
>>> putting this out there.
>>>
>>> On Mon, Jan 31, 2022 at 12:21 PM Zachary 

Re: [DISCUSS] Migrate Jira to GitHub Issues?

2022-02-14 Thread Aizhamal Nurmamat kyzy
Very humbly, I think the benefits of moving to GitHub Issues outweigh the
shortcomings.

Jan, Kenn, Alexey, JB: adding you directly as you had some concerns.
Please, let us know if they were addressed by the options that we described
in the doc [1]?

If noone objects, I can start working with some of you on Migration TODOs
outlined in the doc I am referencing.


[1]
https://docs.google.com/document/d/1_n7gboVbSKPs-CVcHzADgg8qpNL9igiHqUPCmiOslf0/edit#bookmark=id.izn35w5gsjft

On Thu, Feb 10, 2022 at 1:12 PM Danny McCormick 
wrote:

> I'm definitely +1 on moving to help make the bar for entry lower for new
> contributors (like myself!)
>
> Thanks,
> Danny
>
> On Thu, Feb 10, 2022 at 2:32 PM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
>> Hi all,
>>
>> I think we've had a chance to discuss shortcomings and advantages. I
>> think each person may have a different bias / preference. My bias is to
>> move to Github, to have a more inclusive, approachable project despite the
>> differences in workflow. So I'm +1 on moving.
>>
>> Could others share their bias? Don't think of this as a vote, but I'd
>> like to get a sense of people's preferences, to see if there's a
>> strong/slight feeling either way.
>>
>> Again, the sticky points are summarized here [1], feel free to add to the
>> doc.
>>
>> [1]
>> https://docs.google.com/document/d/1_n7gboVbSKPs-CVcHzADgg8qpNL9igiHqUPCmiOslf0/edit#
>>
>>
>> On Mon, Jan 31, 2022 at 7:23 PM Aizhamal Nurmamat kyzy <
>> aizha...@apache.org> wrote:
>>
>>> Welcome to the Beam community, Danny!
>>>
>>> We would love your help if/when we end up migrating.
>>>
>>> Please add your comments to the doc I shared[1], in case we missed some
>>> cool GH features that could be helpful. Thanks!
>>>
>>> [1]
>>> https://docs.google.com/document/d/1_n7gboVbSKPs-CVcHzADgg8qpNL9igiHqUPCmiOslf0/edit#
>>>
>>> On Mon, Jan 31, 2022, 10:06 AM Danny McCormick <
>>> dannymccorm...@google.com> wrote:
>>>
 > Then (this is something you'd have to code) you could easily write or
 use an existing GithubAction or bot that will assign the labels based on
 the initial selection done by the user at entry. We have not done it yet
 but we might.

 Hey, new contributor here - wanted to chime in with a shameless plug
 because I happen to have written an action that does pretty much exactly
 what you're describing[1] and could be extensible to the use case discussed
 here - it should basically just require writing some config (example in
 action[2]). In general, automated management of labels based on the initial
 issue description + content isn't too hard, it does get significantly
 trickier (but definitely still possible) if you try to automate labels
 based on responses or edits.

 Also, big +1 that the easy integration with Actions is a significant
 advantage of using issues since it helps keep your automations in one place
 (or at least fewer places) and gives you a lot of tools out of the box both
 from the community and from the Actions org. *Disclaimer:* I am
 definitely biased. Until 3 weeks ago I was working on the Actions team at
 GitHub.

 I'd be happy to help with some of the issue automation if we decide
 that would be helpful, whether that's reusing existing work or tailoring it
 more exactly to the Beam use case.

 [1] https://github.com/damccorm/tag-ur-it
 [2] https://github.com/microsoft/azure-pipelines-tasks/issues/15839

 Thanks,
 Danny

 On Mon, Jan 31, 2022 at 12:49 PM Zachary Houfek 
 wrote:

> > You can link PR to the issue by just mentioning #Issue in the commit
> message. If you do not prefix it with "Closes:" "Fixes:" or similar it 
> will
> be just linked
>
> Ok, thanks for the clarification there.
>
> Regards,
> Zach
>
> On Mon, Jan 31, 2022 at 12:43 PM Cristian Constantinescu <
> zei...@gmail.com> wrote:
>
>> I've been semi-following this thread, apologies if this has been
>> raised already.
>>
>> From a user point of view, in some corporate environments (that I've
>> worked at), Github is blocked. That includes the issues part. The Apache
>> Jira is not blocked and does at times provide value while investigating
>> issues.
>>
>> Obviously, users stuck in those unfortunate circonstances can just
>> use their personal device. Not advocating any direction on the matter, 
>> just
>> putting this out there.
>>
>> On Mon, Jan 31, 2022 at 12:21 PM Zachary Houfek 
>> wrote:
>>
>>> I added a suggestion that I don't think was discussed here:
>>>
>>> I know that we currently can link multiple PRs to a single Jira, but
>>> GitHub assumes a PR linked to an issue fixes the issue. You also need 
>>> write
>>> access to the repository to link the PR outside of using a "closing
>>> keyword". (For reference: Linking a 

Need Help

2022-02-14 Thread Pramod Kumar
Hi there,
Hope everyone is doing well.
I am stuck at a problem where I have to merge two pcollections one of it is
pretty huge.
While merging both using CoGroupByKey the workers run out of memory.
I have tried to do that using batches or use a local fixed window but
nothing is working well for me.
Can someone suggest what's the better way to merge such huge pcollections
with running OOM.

I appreciate your help.


Re: [RFC][Design] Automate Reviewer Assignment

2022-02-14 Thread Robert Burke
+1 great proposal.

On Mon, Feb 14, 2022, 2:33 PM Kenneth Knowles  wrote:

> Yea, great proposal. I expect we'll discover further refinements through
> experience more than deliberation, so I don't have any more comments on the
> doc.
>
> Kenn
>
> On Mon, Feb 14, 2022 at 9:04 AM Kerry Donny-Clark 
> wrote:
>
>> Thanks Danny, we can try this out and update as well. Everyone, please
>> let us know how this is working in practice once we roll it out.
>> Kerry
>>
>> On Mon, Feb 14, 2022 at 11:23 AM Danny McCormick <
>> dannymccorm...@google.com> wrote:
>>
>>> Thank you everyone who has chimed in here or on the doc - there's been a
>>> lot of good discussion and I think that will lead to a much better outcome!
>>>
>>> Since there's been general support for the idea and the flow of new
>>> comments tapered off a bit before the weekend, I'm going to go ahead and
>>> start to move forward with building out the automation (tracking JIRA here
>>> - https://issues.apache.org/jira/browse/BEAM-13925). Please feel free
>>> to leave any more thoughts in the doc and I promise I will respond/work to
>>> incorporate any thoughts that merit tweaking the design.
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Fri, Feb 11, 2022 at 12:34 PM Robert Bradshaw 
>>> wrote:
>>>
 This looks like a great plan! I remember being disappointed when
 CODEOWNERs didn't meet our needs, but this looks like it resolves all
 those issues.

 On Fri, Feb 11, 2022 at 9:02 AM Chamikara Jayalath <
 chamik...@google.com> wrote:
 >
 > Thanks. I think this is shaping up to be a great proposal.
 >
 > - Cham
 >
 > On Fri, Feb 11, 2022 at 7:12 AM Jarek Potiuk 
 wrote:
 >>
 >> Cool. Looking forward to see how it goes for Beam. We will also be
 at the point soon that likely we will want to do something more
 sophisticated!
 >>
 >> On Fri, Feb 11, 2022 at 4:08 PM Danny McCormick <
 dannymccorm...@google.com> wrote:
 >>>
 >>> Hey Jared, thanks for chiming in - I've been really appreciative of
 the Airflow perspective (here and in the GitHub issues conversation), and
 definitely hope we can keep learning from each other! We did consider
 CODEOWNERs, but ultimately decided against it because it couldn't hit some
 of our goals - specifically:
 >>>
 >>> 1. Providing multiple passes of assignment (once to a larger set of
 reviewers, and then again to a second set of committers).
 >>>
 >>> 2. Balancing reviews - like you mentioned, there's not a great way
 to do round robining, or even assign to a single person from a set of
 people. Technically you can actually do this if every codeowner is part of
 a team (https://twitter.com/github/status/1194673101117808653?lang=en),
 but many Beam reviewers in our new model won't be a part of the Apache org.
 (Maybe that feature would be of interest to Airflow though? It looks like
 maybe all of your CODEOWNERS are part of the Apache org? I can't 100% 
 tell).
 >>>
 >>> 3. Don't break the existing use case where a contributor wants a
 review from a specific person.
 >>>
 >>> Thanks,
 >>> Danny
 >>>
 >>> On Thu, Feb 10, 2022 at 7:52 AM Jarek Potiuk 
 wrote:
 
  Very interesting one - as an outsider I am interested to see how
 this initiative will work out for the beam community.
 
  Just one comment - maybe you do not know but in GitHub there is a
 "CODEOWNERS" feature (I notice you are not using it). Quote from
 https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
 
  | Code owners are automatically requested for review when someone
 opens a pull request that modifies code that they own. Code owners are not
 automatically requested to review draft pull requests. For more information
 about draft pull requests, see "About pull requests." When you mark a draft
 pull request as ready for review, code owners are automatically notified.
 If you convert a pull request to a draft, people who are already subscribed
 to notifications are not automatically unsubscribed. For more information,
 see "Changing the stage of a pull request."
 
  This is an extremely poor version of what you try to do in Beam
 (just assign everyone who is code owner as reviewer, no round-robin, no
 reviewers role etc.), but maybe you want to try it quickly if you want to
 test if any kind of "ownership" might help with at least initial vetting of
 PRs.
  This feature is enabled by literally committing one -
 gitignore-like - file to repo, so it can be introduced extremely quickly.
 
  Airlfow's CODEOWNERS here as an example:
 https://github.com/apache/airflow/blob/main/.github/CODEOWNERS
 
  J.
 
  On Thu, Feb 10, 2022 at 7:31 

Re: [RFC][Design] Automate Reviewer Assignment

2022-02-14 Thread Kenneth Knowles
Yea, great proposal. I expect we'll discover further refinements through
experience more than deliberation, so I don't have any more comments on the
doc.

Kenn

On Mon, Feb 14, 2022 at 9:04 AM Kerry Donny-Clark 
wrote:

> Thanks Danny, we can try this out and update as well. Everyone, please let
> us know how this is working in practice once we roll it out.
> Kerry
>
> On Mon, Feb 14, 2022 at 11:23 AM Danny McCormick <
> dannymccorm...@google.com> wrote:
>
>> Thank you everyone who has chimed in here or on the doc - there's been a
>> lot of good discussion and I think that will lead to a much better outcome!
>>
>> Since there's been general support for the idea and the flow of new
>> comments tapered off a bit before the weekend, I'm going to go ahead and
>> start to move forward with building out the automation (tracking JIRA here
>> - https://issues.apache.org/jira/browse/BEAM-13925). Please feel free to
>> leave any more thoughts in the doc and I promise I will respond/work to
>> incorporate any thoughts that merit tweaking the design.
>>
>> Thanks,
>> Danny
>>
>> On Fri, Feb 11, 2022 at 12:34 PM Robert Bradshaw 
>> wrote:
>>
>>> This looks like a great plan! I remember being disappointed when
>>> CODEOWNERs didn't meet our needs, but this looks like it resolves all
>>> those issues.
>>>
>>> On Fri, Feb 11, 2022 at 9:02 AM Chamikara Jayalath 
>>> wrote:
>>> >
>>> > Thanks. I think this is shaping up to be a great proposal.
>>> >
>>> > - Cham
>>> >
>>> > On Fri, Feb 11, 2022 at 7:12 AM Jarek Potiuk  wrote:
>>> >>
>>> >> Cool. Looking forward to see how it goes for Beam. We will also be at
>>> the point soon that likely we will want to do something more sophisticated!
>>> >>
>>> >> On Fri, Feb 11, 2022 at 4:08 PM Danny McCormick <
>>> dannymccorm...@google.com> wrote:
>>> >>>
>>> >>> Hey Jared, thanks for chiming in - I've been really appreciative of
>>> the Airflow perspective (here and in the GitHub issues conversation), and
>>> definitely hope we can keep learning from each other! We did consider
>>> CODEOWNERs, but ultimately decided against it because it couldn't hit some
>>> of our goals - specifically:
>>> >>>
>>> >>> 1. Providing multiple passes of assignment (once to a larger set of
>>> reviewers, and then again to a second set of committers).
>>> >>>
>>> >>> 2. Balancing reviews - like you mentioned, there's not a great way
>>> to do round robining, or even assign to a single person from a set of
>>> people. Technically you can actually do this if every codeowner is part of
>>> a team (https://twitter.com/github/status/1194673101117808653?lang=en),
>>> but many Beam reviewers in our new model won't be a part of the Apache org.
>>> (Maybe that feature would be of interest to Airflow though? It looks like
>>> maybe all of your CODEOWNERS are part of the Apache org? I can't 100% tell).
>>> >>>
>>> >>> 3. Don't break the existing use case where a contributor wants a
>>> review from a specific person.
>>> >>>
>>> >>> Thanks,
>>> >>> Danny
>>> >>>
>>> >>> On Thu, Feb 10, 2022 at 7:52 AM Jarek Potiuk 
>>> wrote:
>>> 
>>>  Very interesting one - as an outsider I am interested to see how
>>> this initiative will work out for the beam community.
>>> 
>>>  Just one comment - maybe you do not know but in GitHub there is a
>>> "CODEOWNERS" feature (I notice you are not using it). Quote from
>>> https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
>>> 
>>>  | Code owners are automatically requested for review when someone
>>> opens a pull request that modifies code that they own. Code owners are not
>>> automatically requested to review draft pull requests. For more information
>>> about draft pull requests, see "About pull requests." When you mark a draft
>>> pull request as ready for review, code owners are automatically notified.
>>> If you convert a pull request to a draft, people who are already subscribed
>>> to notifications are not automatically unsubscribed. For more information,
>>> see "Changing the stage of a pull request."
>>> 
>>>  This is an extremely poor version of what you try to do in Beam
>>> (just assign everyone who is code owner as reviewer, no round-robin, no
>>> reviewers role etc.), but maybe you want to try it quickly if you want to
>>> test if any kind of "ownership" might help with at least initial vetting of
>>> PRs.
>>>  This feature is enabled by literally committing one -
>>> gitignore-like - file to repo, so it can be introduced extremely quickly.
>>> 
>>>  Airlfow's CODEOWNERS here as an example:
>>> https://github.com/apache/airflow/blob/main/.github/CODEOWNERS
>>> 
>>>  J.
>>> 
>>>  On Thu, Feb 10, 2022 at 7:31 AM Ahmet Altay 
>>> wrote:
>>> >
>>> > Thank you Danny. I think this is a great problem to solve, and the
>>> proposal looks great too :) I added comments as others but overall I like
>>> it.
>>> >
>>> > On 

[Question] Dataproc 1.5 - Flink version conflict

2022-02-14 Thread Andoni Guzman Becerra
Hi All, I'm working trying to re-enable some tests like
LoadTests_Combine_Flink_Python.groovy and fix some vms leaked in those
tests. https://issues.apache.org/jira/browse/BEAM-12898
The version of dataproc used before was 1.2 and now it's 1.5.
The problem is that dataproc 1.5  flink version is 1.9 and actually we use
flink 1.13. Causing a mismatch and error running the tests.
In dataproc 1.2 a init script was passed with all the info related with
flink version, but now in optional components only told the component to
install

This was the way to create a cluster in dataproc 1.2

 gcloud dataproc clusters create $CLUSTER_NAME --region=global
--num-workers=$num_dataproc_workers --initialization-actions
$DOCKER_INIT,$BEAM_INIT,$FLINK_INIT --metadata "${metadata}",
--image-version=$image_version --zone=$GCLOUD_ZONE --quiet

And this is the way to do it in dataproc 1.5

gcloud dataproc clusters create $CLUSTER_NAME --region=global
--num-workers=$num_dataproc_workers  --metadata "${metadata}",
--image-version=$image_version --zone=$GCLOUD_ZONE
 --optional-components=FLINK,DOCKER  --quiet--

There is a way to force the flink version in dataproc ? I tried to use
Flink_init with initialization action but it didn't work.

Any help would be appreciated.

Thank you!

Andoni Guzman | WIZELINE

Software Engineer II

andoni.guz...@wizeline.com

Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal.

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*


RE: Re: [Question][Contribution] Python SDK ByteKeyRange

2022-02-14 Thread Sami Niemi
Hello Robert,

Beam has documented only OffsetRangeTracker [1] for new SDF API. Since Beam is 
moving away from Source API, I thought it would be nice to develop IO 
connectors by using new SDFs. For this I need to create restriction tracker 
that follows new SDF API.

So I propose adding ByteKeyRange as new restriction class and 
ByteKeyRestrictionTracker as new restriction tracker class. In my 
implementation I’ve also used ByteKey class which are given to restriction.


  1.  
https://github.com/apache/beam/blob/7eb7fd017a43353204eb8037603409dda7e0414a/sdks/python/apache_beam/io/restriction_trackers.py#L76

On 2022/02/11 18:27:23 Robert Bradshaw wrote:
> Hi Sam! Glad to hear you're willing to contribute.
>
> Though the name is a bit different, I'm wondering if this is already
> present as LexicographicKeyRangeTracker.
> https://github.com/apache/beam/blob/release-2.35.0/sdks/python/apache_beam/io/range_trackers.py#L349
>
> On Fri, Feb 11, 2022 at 9:54 AM Ahmet Altay 
> mailto:al...@google.com>> wrote:
> >
> > Hi Sami. Thank you for your interest.
> >
> > Adding people who might be able to comment: @Chamikara Jayalath @Lukasz Cwik
> >
> > On Thu, Feb 10, 2022 at 8:38 AM Sami Niemi 
> > mailto:sa...@solita.fi>> wrote:
> >>
> >> Hello,
> >>
> >>
> >>
> >> I noticed that Python SDK only has implementation for OffsetRangeTracker 
> >> and OffsetRange while Java also has ByteKeyRange and -Tracker.
> >>
> >>
> >>
> >> I have currently created simple implementations of following Python 
> >> classes:
> >>
> >> ByteKey
> >> ByteKeyRange
> >> ByteKeyRestrictionTracker
> >>
> >>
> >>
> >> I would like to make contribution and make these available in Python SDK 
> >> in addition to OffsetRange and -Tracker. I would like to hear any thoughts 
> >> about this and should I make a contribution.
> >>
> >>
> >>
> >> Thank you,
> >>
> >> Sami Niemi
>






SAMI NIEMI
Data Engineer
+358 50 412 2115
sami.ni...@solita.fi

SOLITA
Eteläesplanadi 8
00130 Helsinki
solita.fi



Flaky test issue report (49)

2022-02-14 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-13859: Test flake: 
test_split_half_sdf (created 2022-02-09)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13810: Flaky tests: Gradle build 
daemon disappeared unexpectedly (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13797: Flakes: Failed to load 
cache entry (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13783: 
apache_beam.transforms.combinefn_lifecycle_test.LocalCombineFnLifecycleTest.test_combine
 is flaky (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13708: flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored (created 2022-01-20)
https://issues.apache.org/jira/browse/BEAM-13693: 
beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13575: Flink 
testParDoRequiresStableInput flaky (created 2021-12-28)
https://issues.apache.org/jira/browse/BEAM-13519: Java precommit flaky 
(timing out) (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13500: NPE in Flink Portable 
ValidatesRunner streaming suite (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13453: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use 
(created 2021-12-13)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13312: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite  (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13311: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13234: Flake in 
StreamingWordCountIT.test_streaming_wordcount_it (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2   (created 2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12858: 
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler 
is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12793: 
beam_PostRelease_NightlySnapshot failed (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12673: 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey (created 2021-07-28)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)

P1 issues report (71)

2022-02-14 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-13921: 
validatesCrossLanguageRunnerGoUsingJava  failing for beam_PostCommit_XVR_Spark 
(created 2022-02-10)
https://issues.apache.org/jira/browse/BEAM-13920: Beam x-lang Dataflow 
tests failing due to _InactiveRpcError (created 2022-02-10)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13830: XVR Direct/Spark/Flink 
tests are timing out (created 2022-02-04)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13809: beam_PostCommit_XVR_Flink 
flaky: Connection refused (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13805: Simplify version override 
for Dev versions of the Go SDK. (created 2022-02-02)
https://issues.apache.org/jira/browse/BEAM-13798: Upgrade Kubernetes 
Clusters (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13769: 
beam_PreCommit_Python_Cron failing on test_create_uses_coder_for_pickling 
(created 2022-01-28)
https://issues.apache.org/jira/browse/BEAM-13763: Rotate credentials for 
'io-datastores' Kubernetes cluster (created 2022-01-28)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13715: Kafka commit offset drop 
data on failure for runners that have non-checkpointing shuffle (created 
2022-01-21)
https://issues.apache.org/jira/browse/BEAM-13694: 
beam_PostCommit_Java_Hadoop_Versions failing with ClassDefNotFoundError 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13693: 
beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13668: Java Spanner IO Request 
Count metrics broke backwards compatibility (created 2022-01-15)
https://issues.apache.org/jira/browse/BEAM-13582: Beam website precommit 
mentions broken links, but passes. (created 2021-12-30)
https://issues.apache.org/jira/browse/BEAM-13579: Cannot run 
python_xlang_kafka_taxi_dataflow validation script on 2.35.0 (created 
2021-12-29)
https://issues.apache.org/jira/browse/BEAM-13487: WriteToBigQuery Dynamic 
table destinations returns wrong tableId (created 2021-12-17)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13376: Missing error for 
nonexistent column family BigTable (created 2021-12-03)
https://issues.apache.org/jira/browse/BEAM-13237: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2 (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13164: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory (created 2021-11-01)
https://issues.apache.org/jira/browse/BEAM-13132: WriteToBigQuery submits a 
duplicate BQ load job if a 503 error code is returned from googleapi (created 
2021-10-27)
https://issues.apache.org/jira/browse/BEAM-13087: 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible 
(created 2021-10-20)
https://issues.apache.org/jira/browse/BEAM-13078: Python DirectRunner does 
not emit data at GC time (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13076: Python AfterAny, AfterAll 
do not follow spec (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files 
(created 2021-10-06)
https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with 
random prefix (created 2021-10-04)
https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in 
CombinePerKey operation (created 2021-09-26)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) 
Bricked Dataflow Pipeline  (created 2021-09-06)
https://issues.apache.org/jira/browse/BEAM-12807: Java creates an incorrect 
pipeline proto when core-construction-java jar is not in the CLASSPATH 

P0 (outage) report

2022-02-14 Thread Beam Jira Bot
This is your daily summary of Beam's current outages. See 
https://beam.apache.org/contribute/jira-priorities/#p0-outage for the meaning 
and expectations around P0 issues.

BEAM-13931: BigQueryIO is sending rows that are too large to Deadletter 
Queue even on RETRY_ALWAYS (https://issues.apache.org/jira/browse/BEAM-13931)


Beam Dependency Check Report (2022-02-14)

2022-02-14 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
cachetools
4.2.4
5.0.0
2021-12-27
2021-12-27BEAM-9017
chromedriver-binary
96.0.4664.45.0
99.0.4844.17.0
2021-11-18
2022-02-07BEAM-10426
dill
0.3.1.1
0.3.4
2019-10-07
2021-06-14BEAM-11167
google-api-core
1.31.5
2.5.0
2021-12-20
2022-02-07BEAM-12784
google-auth
1.35.0
2.6.0
2021-08-23
2022-02-07BEAM-12785
google-cloud-bigtable
1.7.0
2.5.0
2021-04-12
2022-02-14BEAM-8127
google-cloud-datastore
1.15.3
2.4.0
2020-11-16
2021-11-18BEAM-8443
google-cloud-language
1.3.0
2.3.2
2020-10-26
2022-01-24BEAM-8
google-cloud-recommendations-ai
0.2.0
0.5.1
2021-07-05
2021-11-18BEAM-13273
google-cloud-spanner
1.19.1
3.13.0
2020-11-16
2022-02-07BEAM-10345
google-cloud-videointelligence
1.16.1
2.5.1
2020-11-23
2021-11-18BEAM-11319
google-cloud-vision
1.0.0
2.6.3
2020-03-24
2021-12-13BEAM-9581
grpcio-tools
1.37.0
1.43.0
2021-04-12
2021-12-20BEAM-9582
ipykernel
5.5.6
6.9.0
2021-10-11
2022-02-14BEAM-12575
ipython
7.31.1
8.0.1
2022-01-24
2022-01-24BEAM-13670
jupyter-client
6.1.12
7.1.2
2021-04-12
2022-01-24BEAM-12786
mistune
0.8.4
2.0.2
2021-12-06
2022-01-17BEAM-13382
mock
2.0.0
4.0.3
2019-05-20
2020-12-14BEAM-7369
mypy-protobuf
1.18
3.2.0
2020-03-24
2022-01-24BEAM-10346
Pillow
7.2.0
9.0.1
2020-10-19
2022-02-07BEAM-11071
pluggy
0.13.1
1.0.0
2021-08-30
2021-08-30BEAM-12819
PyHamcrest
1.10.1
2.0.3
2020-01-20
2021-12-13BEAM-9155
pymongo
3.12.3
4.0.1
2021-12-13
2021-12-13BEAM-13383
pyparsing
2.4.7
3.0.7
2021-11-18
2022-01-24BEAM-13274
pytest
4.6.11
7.0.1
2020-07-08
2022-02-14BEAM-8606
pytest-timeout
1.4.2
2.1.0
2021-10-11
2022-01-24BEAM-13029
pytest-xdist
1.34.0
2.5.0
2020-08-17
2021-12-13BEAM-10713
setuptools
60.6.0
60.9.0
2022-01-31
2022-02-14BEAM-10714
tenacity
5.1.5
8.0.1
2019-11-11
2021-07-19BEAM-8607
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.alibaba:fastjson
1.2.69
1.2.79
2020-05-31
2021-12-19BEAM-8632
com.azure:azure-core
1.9.0
1.25.0
2020-10-02
2022-02-04BEAM-11888
com.azure:azure-identity
1.0.8
1.4.4
2020-07-07
2022-02-08BEAM-11814
com.azure:azure-storage-blob
12.10.0
12.15.0-beta.3
2021-01-15
2022-02-09BEAM-10800
com.azure:azure-storage-common
12.10.0
12.15.0-beta.3
2021-01-14
2022-02-09BEAM-11889
com.datastax.cassandra:cassandra-driver-core
3.10.2
4.0.0
2020-08-26
2019-03-18BEAM-8674
com.esotericsoftware:kryo
4.0.2
5.3.0
2018-03-20
2022-02-11BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.33.0
0.42.0
2020-09-14
2022-02-07BEAM-6645
com.github.jbellis:jamm
0.3.0
0.3.3
2014-11-19
2018-11-16BEAM-13622
com.github.jk1.dependency-license-report:com.github.jk1.dependency-license-report.gradle.plugin
1.16
2.1
2020-10-26
2022-01-24BEAM-11120
com.github.spotbugs:spotbugs
4.0.6
4.5.3
2020-06-23
2022-01-05BEAM-7792
com.github.spotbugs:spotbugs-annotations
4.0.6
4.5.3
2020-06-23
2022-01-05BEAM-6951
com.google.api:gax
2.8.1
2.12.2

Beam Metrics Report (2022-02-14)

2022-02-14 Thread Apache Jenkins Server
<<< text/html; charset=UTF-8: Unrecognized >>>