Re: [PROPOSAL] Upgrade vendor grpc
Yes, thank you! On Thu, Jan 11, 2024 at 8:21 PM Chamikara Jayalath via dev < dev@beam.apache.org> wrote: > Sounds good and thanks for doing this :) > > - Cham > > On Thu, Jan 11, 2024 at 8:06 AM Yi Hu via dev wrote: > >> Hi everyone, >> >> I would like to volunteer to upgrade the Beam vendored grpc, as requested >> by the GitHub Issue [1]. The last update was in Apr 2023 [2]. There have >> been vulnerabilities in its dependencies as well as potential oom issues >> found since then (see [1]), and also to include grpc-alts [2]. >> >> My plan is to follow the release process [3, 4], which involves preparing >> for the release, building a candidate, voting and finalizing the release. >> Then the vendored artifact is targeted to be integrated by Beam v2.54.0 >> onwards (cut date Jan 24, 2024). >> >> Please let me know if you have any comments/objections/questions. >> >> Thanks, >> >> Yi >> >> [1] https://github.com/apache/beam/issues/29861 >> [2] https://github.com/apache/beam/issues/25746 >> [3] https://github.com/apache/beam/tree/master/vendor >> [4] >> https://docs.google.com/document/d/1ztEoyGkqq9ie5riQxRtMuBu3vb6BUO91mSMn1PU0pDA/edit#heading=h.vhcuqlttpnog >> -- >> >> Yi Hu, (he/him/his) >> >> Software Engineer >> >> >>
Re: ByteBuddy DoFnInvokers Write Up
This is really great, and a very good idea to document. Going from "what does a DoFnSignature and DoFnInvoker look like for a particular DoFn" is super useful to even explain why these constructions exist. And from there, you can talk about what the bytecode looks like and what the ByteBuddy to generate it looks like. Kenn On Thu, Jan 11, 2024 at 12:26 PM Ismaël Mejía wrote: > Neat! I remember passing long time trying to decipher the DoFnInvoker > behavior so this will definitely be helpful. > > Maybe a good idea to add the link to the Design Documents list for future > reference > https://cwiki.apache.org/confluence/display/BEAM/Design+Documents > > On Wed, Jan 10, 2024 at 9:15 PM Robert Burke wrote: > >> That's neat! Thanks for writing that up! >> >> On Wed, Jan 10, 2024, 11:12 AM John Casey via dev >> wrote: >> >>> The team at Google recently held an internal hackathon, and my hack >>> involved modifying how our ByteBuddy DoFnInvokers work. My hack didn't end >>> up going anywhere, but I learned a lot about how our code generation works. >>> It turns out we have no documentation or design docs about our code >>> generation, so I wrote up what I learned, >>> >>> Please take a look, and let me know if I got anything wrong, or if you >>> are looking for more detail >>> >>> s.apache.org/beam-bytebuddy-dofninvoker >>> >>> John >>> >>
Beam High Priority Issue Report (53)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/29972 [Bug]: testHotKeyCombineWithSideInputs permared on Spark SparkStructuredStreaming runner https://github.com/apache/beam/issues/29971 [Bug]: FixedWindows not working for large Kafka topic https://github.com/apache/beam/issues/29926 [Bug]: FileIO: lack of timeouts may cause the pipeline to get stuck indefinitely https://github.com/apache/beam/issues/29912 [Bug]: floatValueExtractor judge float and double equality directly https://github.com/apache/beam/issues/29902 [Bug]: Messages are not ACK on Pubsub starting Beam 2.52.0 on Flink Runner in detached mode https://github.com/apache/beam/issues/29825 [Bug]: Usage of logical types breaks Beam YAML Sql https://github.com/apache/beam/issues/29413 [Bug]: Can not use Avro over 1.8.2 with Beam 2.52.0 https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness doesn't update user counters in OnTimer callback functions https://github.com/apache/beam/issues/29022 [Failing Test]: Python Github actions tests are failing due to update of pip https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader provided by apache beam does not pick the event time for watermarking https://github.com/apache/beam/issues/28715 [Bug]: Python WriteToBigtable get stuck for large jobs due to client dead lock https://github.com/apache/beam/issues/28383 [Failing Test]: org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric https://github.com/apache/beam/issues/28339 Fix failing "beam_PostCommit_XVR_GoUsingJava_Dataflow" job https://github.com/apache/beam/issues/28326 Bug: apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be leaking on 2.49.0 with Dataflow https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not working when using CreateDisposition.CREATE_IF_NEEDED https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. PeriodicImpulse) running in Flink and polling using tracker.defer_remainder have checkpoint size growing indefinitely https://github.com/apache/beam/issues/27616 [Bug]: Unable to use applyRowMutations() in bigquery IO apache beam java https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with inequality filters https://github.com/apache/beam/issues/27314 [Failing Test]: bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1] https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when using Kafka and GroupByKey on Dataflow Runner https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested ROW (described below) https://github.com/apache/beam/issues/26343 [Bug]: apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is flaky https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24389 [Failing Test]: HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError ContainerFetchException https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/21714 PulsarIOTest.testReadFromSimpleTopic is very flaky https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://g