Re: [PROPOSAL] Upgrade vendor grpc

2024-01-12 Thread Kenneth Knowles
Yes, thank you!

On Thu, Jan 11, 2024 at 8:21 PM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> Sounds good and thanks for doing this :)
>
> - Cham
>
> On Thu, Jan 11, 2024 at 8:06 AM Yi Hu via dev  wrote:
>
>> Hi everyone,
>>
>> I would like to volunteer to upgrade the Beam vendored grpc, as requested
>> by the GitHub Issue [1]. The last update was in Apr 2023 [2]. There have
>> been vulnerabilities in its dependencies as well as potential oom issues
>> found since then (see [1]), and also to include grpc-alts [2].
>>
>> My plan is to follow the release process [3, 4], which involves preparing
>> for the release, building a candidate, voting and finalizing the release.
>> Then the vendored artifact is targeted to be integrated by Beam v2.54.0
>> onwards (cut date Jan 24, 2024).
>>
>> Please let me know if you have any comments/objections/questions.
>>
>> Thanks,
>>
>> Yi
>>
>> [1] https://github.com/apache/beam/issues/29861
>> [2] https://github.com/apache/beam/issues/25746
>> [3] https://github.com/apache/beam/tree/master/vendor
>> [4]
>> https://docs.google.com/document/d/1ztEoyGkqq9ie5riQxRtMuBu3vb6BUO91mSMn1PU0pDA/edit#heading=h.vhcuqlttpnog
>> --
>>
>> Yi Hu, (he/him/his)
>>
>> Software Engineer
>>
>>
>>


Re: ByteBuddy DoFnInvokers Write Up

2024-01-12 Thread Kenneth Knowles
This is really great, and a very good idea to document. Going from "what
does a DoFnSignature and DoFnInvoker look like for a particular DoFn" is
super useful to even explain why these constructions exist. And from there,
you can talk about what the bytecode looks like and what the ByteBuddy to
generate it looks like.

Kenn

On Thu, Jan 11, 2024 at 12:26 PM Ismaël Mejía  wrote:

> Neat! I remember passing long time trying to decipher the DoFnInvoker
> behavior so this will definitely be helpful.
>
> Maybe a good idea to add the link to the Design Documents list for future
> reference
> https://cwiki.apache.org/confluence/display/BEAM/Design+Documents
>
> On Wed, Jan 10, 2024 at 9:15 PM Robert Burke  wrote:
>
>> That's neat! Thanks for writing that up!
>>
>> On Wed, Jan 10, 2024, 11:12 AM John Casey via dev 
>> wrote:
>>
>>> The team at Google recently held an internal hackathon, and my hack
>>> involved modifying how our ByteBuddy DoFnInvokers work. My hack didn't end
>>> up going anywhere, but I learned a lot about how our code generation works.
>>> It turns out we have no documentation or design docs about our code
>>> generation, so I wrote up what I learned,
>>>
>>> Please take a look, and let me know if I got anything wrong, or if you
>>> are looking for more detail
>>>
>>> s.apache.org/beam-bytebuddy-dofninvoker
>>>
>>> John
>>>
>>


Beam High Priority Issue Report (53)

2024-01-12 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/29972 [Bug]: 
testHotKeyCombineWithSideInputs permared on Spark SparkStructuredStreaming 
runner
https://github.com/apache/beam/issues/29971 [Bug]: FixedWindows not working for 
large Kafka topic
https://github.com/apache/beam/issues/29926 [Bug]: FileIO: lack of timeouts may 
cause the pipeline to get stuck indefinitely
https://github.com/apache/beam/issues/29912 [Bug]: floatValueExtractor judge 
float and double equality directly
https://github.com/apache/beam/issues/29902 [Bug]: Messages are not ACK on 
Pubsub starting Beam 2.52.0 on Flink Runner in detached mode
https://github.com/apache/beam/issues/29825 [Bug]: Usage of logical types 
breaks Beam YAML Sql
https://github.com/apache/beam/issues/29413 [Bug]: Can not use Avro over 1.8.2 
with Beam 2.52.0
https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness 
doesn't update user counters in OnTimer callback functions
https://github.com/apache/beam/issues/29022 [Failing Test]: Python Github 
actions tests are failing due to update of pip 
https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader 
provided by apache beam does not pick the event time for watermarking
https://github.com/apache/beam/issues/28715 [Bug]: Python WriteToBigtable get 
stuck for large jobs due to client dead lock
https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28339 Fix failing 
"beam_PostCommit_XVR_GoUsingJava_Dataflow" job
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be 
leaking on 2.49.0 with Dataflow
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://g