Beam Dependency Check Report (2023-05-10)

2023-05-11 Thread Apache Jenkins Server
<<< text/html; charset=UTF-8: Unrecognized >>>


Re: [Notice] Jenkins seed job comment trigger no longer working, and possible solutions

2023-05-11 Thread Anand Inguva via dev
+1 to add committers to the list manually.

Thanks Yi for doing this.


On Thu, May 11, 2023 at 11:48 AM Danny McCormick via dev <
dev@beam.apache.org> wrote:

> I'm +1 on just adding committers to a list manually. Having the ability to
> run seed jobs from a PR is nice, but adding a new committer is a rare
> enough event that automating is not worth the time IMO (as opposed to
> documenting this as something to do when you're a new committer). Plus this
> problem goes away entirely if we move to GitHub Actions :)
>
> One thing I'll note: there is an automation route that involves querying
> the teams from the Apache GitHub org, this would require us to upload a
> custom PAT though which incurs secret rotation and is more work than its
> worth IMO.
>
> If we decide to do this, I have https://github.com/apache/beam/pull/26672
> prepared.
>
> Thanks,
> Danny
>
> On Thu, May 11, 2023 at 11:20 AM Yi Hu via dev 
> wrote:
>
>> Dear Beam Developers,
>>
>> tl;dr For PRs involving Jenkins task changes authored by Beam committers,
>> "Run seed job" no longer working due to apache infra change.
>>
>> It is noted that due to recent Apache Infra change on LDAP server, Beam
>> Jenkins CI/CD no longer has access to the GitHub username list, and
>> consequently several Jenkins tasks that used to have triggers enabled by
>> committers can no longer triggered by commenting phrase against PR (e.g.
>> "Run seed job")
>>
>> A full list of affected jobs are
>>
>>
>>- seed_00_job
>>- seed_job_standalone
>>- beam_Publish_Docker_Snapshots
>>- beam_Dependency_Check
>>- beam_Metrics_Report
>>
>> Other than the seed job are release related workflows and should not
>> affect development on code base.
>>
>> I have created a PR to temporarily remove the step of fetching GitHub
>> usernames [2] to get the seed job back green. After that, I would like to
>> ask the community if it is fine to either
>>
>>
>>- Leave these jobs have no comment trigger (they can still be
>>manually triggered via steps described in [2], besides the scheduled jobs)
>>- Maintain a list of committer GitHub usernames manually in
>>
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/Committers.groovy
>>
>>
>> Please feel free to share if you have a better idea for fixing this.
>>
>> See more context on
>> [1] https://github.com/apache/beam/issues/26602
>> [2] https://github.com/apache/beam/pull/26652
>>
>>
>> Regards,
>> Yi
>>
>> --
>>
>> Yi Hu, (he/him/his)
>>
>> Software Engineer
>>
>>
>>


Re: [Notice] Jenkins seed job comment trigger no longer working, and possible solutions

2023-05-11 Thread Danny McCormick via dev
I'm +1 on just adding committers to a list manually. Having the ability to
run seed jobs from a PR is nice, but adding a new committer is a rare
enough event that automating is not worth the time IMO (as opposed to
documenting this as something to do when you're a new committer). Plus this
problem goes away entirely if we move to GitHub Actions :)

One thing I'll note: there is an automation route that involves querying
the teams from the Apache GitHub org, this would require us to upload a
custom PAT though which incurs secret rotation and is more work than its
worth IMO.

If we decide to do this, I have https://github.com/apache/beam/pull/26672
prepared.

Thanks,
Danny

On Thu, May 11, 2023 at 11:20 AM Yi Hu via dev  wrote:

> Dear Beam Developers,
>
> tl;dr For PRs involving Jenkins task changes authored by Beam committers,
> "Run seed job" no longer working due to apache infra change.
>
> It is noted that due to recent Apache Infra change on LDAP server, Beam
> Jenkins CI/CD no longer has access to the GitHub username list, and
> consequently several Jenkins tasks that used to have triggers enabled by
> committers can no longer triggered by commenting phrase against PR (e.g.
> "Run seed job")
>
> A full list of affected jobs are
>
>
>- seed_00_job
>- seed_job_standalone
>- beam_Publish_Docker_Snapshots
>- beam_Dependency_Check
>- beam_Metrics_Report
>
> Other than the seed job are release related workflows and should not
> affect development on code base.
>
> I have created a PR to temporarily remove the step of fetching GitHub
> usernames [2] to get the seed job back green. After that, I would like to
> ask the community if it is fine to either
>
>
>- Leave these jobs have no comment trigger (they can still be manually
>triggered via steps described in [2], besides the scheduled jobs)
>- Maintain a list of committer GitHub usernames manually in
>
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/Committers.groovy
>
>
> Please feel free to share if you have a better idea for fixing this.
>
> See more context on
> [1] https://github.com/apache/beam/issues/26602
> [2] https://github.com/apache/beam/pull/26652
>
>
> Regards,
> Yi
>
> --
>
> Yi Hu, (he/him/his)
>
> Software Engineer
>
>
>


[Notice] Jenkins seed job comment trigger no longer working, and possible solutions

2023-05-11 Thread Yi Hu via dev
Dear Beam Developers,

tl;dr For PRs involving Jenkins task changes authored by Beam committers,
"Run seed job" no longer working due to apache infra change.

It is noted that due to recent Apache Infra change on LDAP server, Beam
Jenkins CI/CD no longer has access to the GitHub username list, and
consequently several Jenkins tasks that used to have triggers enabled by
committers can no longer triggered by commenting phrase against PR (e.g.
"Run seed job")

A full list of affected jobs are


   - seed_00_job
   - seed_job_standalone
   - beam_Publish_Docker_Snapshots
   - beam_Dependency_Check
   - beam_Metrics_Report

Other than the seed job are release related workflows and should not affect
development on code base.

I have created a PR to temporarily remove the step of fetching GitHub
usernames [2] to get the seed job back green. After that, I would like to
ask the community if it is fine to either


   - Leave these jobs have no comment trigger (they can still be manually
   triggered via steps described in [2], besides the scheduled jobs)
   - Maintain a list of committer GitHub usernames manually in
   
https://github.com/apache/beam/blob/master/.test-infra/jenkins/Committers.groovy


Please feel free to share if you have a better idea for fixing this.

See more context on
[1] https://github.com/apache/beam/issues/26602
[2] https://github.com/apache/beam/pull/26652


Regards,
Yi

-- 

Yi Hu, (he/him/his)

Software Engineer


Beam High Priority Issue Report (35)

2023-05-11 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P0 Issues:

https://github.com/apache/beam/issues/26661 [Bug]: JDBCIO Read without 
Partition occur GC overhead limit exceeded
https://github.com/apache/beam/issues/26602 [Failing Test]: Seed job permared 
due to insufficient_access LDAP


Unassigned P1 Issues:

https://github.com/apache/beam/issues/26621 [Failing Test]: 
beam_PerformanceTests_SparkReceiver_IO failing
https://github.com/apache/beam/issues/26616 [Failing Test]: 
beam_PostCommit_Java_DataflowV2 SpannerReadIT multiple test failing
https://github.com/apache/beam/issues/26587 [Bug]: BigQuery Copy jobs do not 
set write disposition to WRITE_APPEND after first copy
https://github.com/apache/beam/issues/26550 [Failing Test]: 
beam_PostCommit_Java_PVR_Spark_Batch
https://github.com/apache/beam/issues/26547 [Failing Test]: 
beam_PostCommit_Java_DataflowV2
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey