Jira Contributor Permission Request

2022-02-10 Thread Daria Bezkorovaina
Hi folks,


Please grant contributor access to Apache Beam Jira. I work with Akvelon team 
on Apache Beam website updates and case studies, need to create and assign 
tickets.


Jira username: daria.bezkorovaina


Thank you in advance!

DARIA BEZKOROVAINA | AKVELON



Re: [DISCUSS] Migrate Jira to GitHub Issues?

2022-02-10 Thread Aizhamal Nurmamat kyzy
Hi all,

I think we've had a chance to discuss shortcomings and advantages. I think
each person may have a different bias / preference. My bias is to move to
Github, to have a more inclusive, approachable project despite the
differences in workflow. So I'm +1 on moving.

Could others share their bias? Don't think of this as a vote, but I'd like
to get a sense of people's preferences, to see if there's a strong/slight
feeling either way.

Again, the sticky points are summarized here [1], feel free to add to the
doc.

[1]
https://docs.google.com/document/d/1_n7gboVbSKPs-CVcHzADgg8qpNL9igiHqUPCmiOslf0/edit#


On Mon, Jan 31, 2022 at 7:23 PM Aizhamal Nurmamat kyzy 
wrote:

> Welcome to the Beam community, Danny!
>
> We would love your help if/when we end up migrating.
>
> Please add your comments to the doc I shared[1], in case we missed some
> cool GH features that could be helpful. Thanks!
>
> [1]
> https://docs.google.com/document/d/1_n7gboVbSKPs-CVcHzADgg8qpNL9igiHqUPCmiOslf0/edit#
>
> On Mon, Jan 31, 2022, 10:06 AM Danny McCormick 
> wrote:
>
>> > Then (this is something you'd have to code) you could easily write or
>> use an existing GithubAction or bot that will assign the labels based on
>> the initial selection done by the user at entry. We have not done it yet
>> but we might.
>>
>> Hey, new contributor here - wanted to chime in with a shameless plug
>> because I happen to have written an action that does pretty much exactly
>> what you're describing[1] and could be extensible to the use case discussed
>> here - it should basically just require writing some config (example in
>> action[2]). In general, automated management of labels based on the initial
>> issue description + content isn't too hard, it does get significantly
>> trickier (but definitely still possible) if you try to automate labels
>> based on responses or edits.
>>
>> Also, big +1 that the easy integration with Actions is a significant
>> advantage of using issues since it helps keep your automations in one place
>> (or at least fewer places) and gives you a lot of tools out of the box both
>> from the community and from the Actions org. *Disclaimer:* I am
>> definitely biased. Until 3 weeks ago I was working on the Actions team at
>> GitHub.
>>
>> I'd be happy to help with some of the issue automation if we decide that
>> would be helpful, whether that's reusing existing work or tailoring it more
>> exactly to the Beam use case.
>>
>> [1] https://github.com/damccorm/tag-ur-it
>> [2] https://github.com/microsoft/azure-pipelines-tasks/issues/15839
>>
>> Thanks,
>> Danny
>>
>> On Mon, Jan 31, 2022 at 12:49 PM Zachary Houfek 
>> wrote:
>>
>>> > You can link PR to the issue by just mentioning #Issue in the commit
>>> message. If you do not prefix it with "Closes:" "Fixes:" or similar it will
>>> be just linked
>>>
>>> Ok, thanks for the clarification there.
>>>
>>> Regards,
>>> Zach
>>>
>>> On Mon, Jan 31, 2022 at 12:43 PM Cristian Constantinescu <
>>> zei...@gmail.com> wrote:
>>>
 I've been semi-following this thread, apologies if this has been raised
 already.

 From a user point of view, in some corporate environments (that I've
 worked at), Github is blocked. That includes the issues part. The Apache
 Jira is not blocked and does at times provide value while investigating
 issues.

 Obviously, users stuck in those unfortunate circonstances can just use
 their personal device. Not advocating any direction on the matter, just
 putting this out there.

 On Mon, Jan 31, 2022 at 12:21 PM Zachary Houfek 
 wrote:

> I added a suggestion that I don't think was discussed here:
>
> I know that we currently can link multiple PRs to a single Jira, but
> GitHub assumes a PR linked to an issue fixes the issue. You also need 
> write
> access to the repository to link the PR outside of using a "closing
> keyword". (For reference: Linking a pull request to an issue
> 
> )
>
> I'm not sure how much this could sway the decisions but thought it was
> worth bringing up.
>
> Regards,
> Zach
>
> On Mon, Jan 31, 2022 at 12:06 PM Jarek Potiuk 
> wrote:
>
>> Just a comment here to clarify the labels from someone who has been
>> using both - ASF (and not only) JIRA and GitHub.
>>
>> The experience from  JIRA labels might be awfully misleading. The
>> JIRA labels are a mess in the ASF because they are shared between 
>> projects
>> and everyone can create a new label. "Mess" is actually quite an
>> understatement IMHO.
>>
>> The labels in GitHub Issues are "per-project" and they can only be
>> added and modified by maintainers (and only maintainers and "issue
>> triagers" can actually assign them other than the initial assignment when
>> you create an issue.
>>
>> 

P1 issues report (72)

2022-02-10 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-13858: Failure of 
:sdks:go:examples:wordCount in check "Mac run local environment shell script" 
(created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13855: Dataflow postcommits 
timing out (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13830: XVR Direct/Spark/Flink 
tests are timing out (created 2022-02-04)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13811: Python postcommit failing 
examples tests (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13809: beam_PostCommit_XVR_Flink 
flaky: Connection refused (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13805: Simplify version override 
for Dev versions of the Go SDK. (created 2022-02-02)
https://issues.apache.org/jira/browse/BEAM-13798: Upgrade Kubernetes 
Clusters (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13769: 
beam_PreCommit_Python_Cron failing on test_create_uses_coder_for_pickling 
(created 2022-01-28)
https://issues.apache.org/jira/browse/BEAM-13763: Rotate credentials for 
'io-datastores' Kubernetes cluster (created 2022-01-28)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13715: Kafka commit offset drop 
data on failure for runners that have non-checkpointing shuffle (created 
2022-01-21)
https://issues.apache.org/jira/browse/BEAM-13694: 
beam_PostCommit_Java_Hadoop_Versions failing with ClassDefNotFoundError 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13693: 
beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13668: Java Spanner IO Request 
Count metrics broke backwards compatibility (created 2022-01-15)
https://issues.apache.org/jira/browse/BEAM-13615: Bumping up FnApi 
environment version to 9 in Java, Python SDK (created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13582: Beam website precommit 
mentions broken links, but passes. (created 2021-12-30)
https://issues.apache.org/jira/browse/BEAM-13579: Cannot run 
python_xlang_kafka_taxi_dataflow validation script on 2.35.0 (created 
2021-12-29)
https://issues.apache.org/jira/browse/BEAM-13487: WriteToBigQuery Dynamic 
table destinations returns wrong tableId (created 2021-12-17)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13376: Missing error for 
nonexistent column family BigTable (created 2021-12-03)
https://issues.apache.org/jira/browse/BEAM-13237: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2 (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13164: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory (created 2021-11-01)
https://issues.apache.org/jira/browse/BEAM-13132: WriteToBigQuery submits a 
duplicate BQ load job if a 503 error code is returned from googleapi (created 
2021-10-27)
https://issues.apache.org/jira/browse/BEAM-13087: 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible 
(created 2021-10-20)
https://issues.apache.org/jira/browse/BEAM-13078: Python DirectRunner does 
not emit data at GC time (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13076: Python AfterAny, AfterAll 
do not follow spec (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files 
(created 2021-10-06)
https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with 
random prefix (created 2021-10-04)
https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in 
CombinePerKey operation (created 2021-09-26)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)

Flaky test issue report (52)

2022-02-10 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-13859: Test flake: 
test_split_half_sdf (created 2022-02-09)
https://issues.apache.org/jira/browse/BEAM-13855: Dataflow postcommits 
timing out (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13811: Python postcommit failing 
examples tests (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13810: Flaky tests: Gradle build 
daemon disappeared unexpectedly (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13797: Flakes: Failed to load 
cache entry (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13783: 
apache_beam.transforms.combinefn_lifecycle_test.LocalCombineFnLifecycleTest.test_combine
 is flaky (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13708: flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored (created 2022-01-20)
https://issues.apache.org/jira/browse/BEAM-13693: 
beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13575: Flink 
testParDoRequiresStableInput flaky (created 2021-12-28)
https://issues.apache.org/jira/browse/BEAM-13525: Java VR (Dataflow, V2, 
Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests (created 
2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13519: Java precommit flaky 
(timing out) (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13500: NPE in Flink Portable 
ValidatesRunner streaming suite (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13453: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use 
(created 2021-12-13)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13312: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite  (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13311: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13234: Flake in 
StreamingWordCountIT.test_streaming_wordcount_it (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2   (created 2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12858: 
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler 
is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12793: 
beam_PostRelease_NightlySnapshot failed (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12673: 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey (created 2021-07-28)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 

[Question][Contribution] Python SDK ByteKeyRange

2022-02-10 Thread Sami Niemi
Hello,

I noticed that Python SDK only has implementation for OffsetRangeTracker and 
OffsetRange while Java also has ByteKeyRange and -Tracker.

I have currently created simple implementations of following Python classes:

  *   ByteKey
  *   ByteKeyRange
  *   ByteKeyRestrictionTracker

I would like to make contribution and make these available in Python SDK in 
addition to OffsetRange and -Tracker. I would like to hear any thoughts about 
this and should I make a contribution.

Thank you,
Sami Niemi


Re: [RFC][Design] Automate Reviewer Assignment

2022-02-10 Thread Jarek Potiuk
Very interesting one - as an outsider I am interested to see how this
initiative will work out for the beam community.

Just one comment - maybe you do not know but in GitHub there is a
"CODEOWNERS" feature (I notice you are not using it). Quote from
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners

| Code owners are automatically requested for review when someone opens a
pull request that modifies code that they own. Code owners are not
automatically requested to review draft pull requests. For more information
about draft pull requests, see "About pull requests." When you mark a draft
pull request as ready for review, code owners are automatically notified.
If you convert a pull request to a draft, people who are already subscribed
to notifications are not automatically unsubscribed. For more information,
see "Changing the stage of a pull request."

This is an extremely poor version of what you try to do in Beam (just
assign everyone who is code owner as reviewer, no round-robin, no reviewers
role etc.), but maybe you want to try it quickly if you want to test if any
kind of "ownership" might help with at least initial vetting of PRs.
This feature is enabled by literally committing one - gitignore-like - file
to repo, so it can be introduced extremely quickly.

Airlfow's CODEOWNERS here as an example:
https://github.com/apache/airflow/blob/main/.github/CODEOWNERS

J.

On Thu, Feb 10, 2022 at 7:31 AM Ahmet Altay  wrote:

> Thank you Danny. I think this is a great problem to solve, and the
> proposal looks great too :) I added comments as others but overall I like
> it.
>
> On Wed, Feb 9, 2022 at 3:02 PM Brian Hulette  wrote:
>
>> Thanks Danny! I left a few suggestions in the doc but I very much like
>> this idea overall.
>>
>> I especially like that "reviewers" is orthogonal to "committers", giving
>> new contributors a clear way to volunteer to help out with code reviews. If
>> we do this we should document it in the contribution guide [1].
>>
>> [1] https://beam.apache.org/contribute/
>>
>> On Wed, Feb 9, 2022 at 2:54 PM Kerry Donny-Clark 
>> wrote:
>>
>>> Danny, this looks like a great mechanism to ensure we review PRs quickly
>>> and distribute the review work more evenly.
>>> Thanks for outlining a clear plan. I strongly support this.
>>> Kerry
>>>
>>> On Wed, Feb 9, 2022, 5:16 PM Danny McCormick 
>>> wrote:
>>>
 Hey everyone, I put together a design doc for automating the assignment
 of reviewers in Beam pull requests. I'd appreciate any thoughts you have!

 Right now, we don't have a well defined automated system for staying on
 top of pull request reviews - we rely on contributors being able to find
 the correct OWNERS file and committers manually triaging/calling attention
 to old pull requests. This doc proposes adding automation driven by GitHub
 Actions to automatically round robin new PR reviews to a set of
 contributors, thus balancing the load. It also proposes adding a new role
 within the beam community of a reviewer who is responsible for an
 initial code review on some PRs before they are routed to a committer for
 final review.

 Please share any feedback or support here -
 https://docs.google.com/document/d/1FhRPRD6VXkYlLAPhNfZB7y2Yese2FCWBzjx67d3TjBo/edit?usp=sharing

 Thanks,
 Danny

>>>