Chronon + Beam

2024-05-07 Thread Varant Zanoyan
Hi Beam community,

Nikhil and I are two of the authors of the Chronon 
project, maintained by Airbnb and Stripe. We're planning on adding an
Enrichment Handler for Beam to Chronon soon, so we wanted to reach out and
see if anyone in this community would be interested in using it.

Specific things that Chronon can help with:

   - Performant time-windowed aggregations
   - Scalable backfills and low latency reads
   - Governance, management, and feature sharing/reuse

If you think Chronon might be helpful to you, please feel free to reply to
this thread and we can find a time to talk more!

Best,
Varant


Re: Job execution starts very slowly on Flink server.

2024-05-07 Thread Steve Niemitz via dev
This sounds about right to me.  When I was working on getting flink running
this was one of the biggest pain points, the upload/download process is
painfully slow.  We had implemented a jar cache in flink to improve this
(and stopped using fat jars)
https://github.com/twitter-forks/flink/commit/01f26ab7124b722ec767c8e01d10395afbbb0dc5
but
I don't think it ever made it upstream.

On Tue, May 7, 2024 at 10:32 AM Balogh, György  wrote:

> Hi,
> We are switching from embedded flink runner to filink server. The embedded
> flink runner works fine, jobs start instantly. However when we use the
> flink server we are experiencing about 30 seconds of idle time (no
> significant cpu, io load) before we see the job in the flink. We are
> targeting at most 2 seconds overhead to run ad-hoc query pipelines.
> We did some investigation and saw that the fat jar is about 150MB that was
> sent from the beam to the flink cluster. Still this does not justify the 30
> sec. We are trying to deploy dependencies to the flink workers and decrease
> the jar size. It's not clear at the moment what causes this delay.
>
> Flink version: 1.16
> Beam version: 2.55.1
>
> Thank you,
> Gyorgy
>
> --
>
> György Balogh
> CTO
> E gyorgy.bal...@ultinous.com 
> M +36 30 270 8342 <+36%2030%20270%208342>
> A HU, 1117 Budapest, Budafoki út 209.
> W www.ultinous.com
>


Job execution starts very slowly on Flink server.

2024-05-07 Thread Balogh , György
Hi,
We are switching from embedded flink runner to filink server. The embedded
flink runner works fine, jobs start instantly. However when we use the
flink server we are experiencing about 30 seconds of idle time (no
significant cpu, io load) before we see the job in the flink. We are
targeting at most 2 seconds overhead to run ad-hoc query pipelines.
We did some investigation and saw that the fat jar is about 150MB that was
sent from the beam to the flink cluster. Still this does not justify the 30
sec. We are trying to deploy dependencies to the flink workers and decrease
the jar size. It's not clear at the moment what causes this delay.

Flink version: 1.16
Beam version: 2.55.1

Thank you,
Gyorgy

-- 

György Balogh
CTO
E gyorgy.bal...@ultinous.com 
M +36 30 270 8342 <+36%2030%20270%208342>
A HU, 1117 Budapest, Budafoki út 209.
W www.ultinous.com


Beam High Priority Issue Report (54)

2024-05-07 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/31122 The PostCommit Go VR Flink job is 
flaky
https://github.com/apache/beam/issues/31074 The StressTests Java KafkaIO job is 
flaky
https://github.com/apache/beam/issues/31047 [Feature Request]: Support Type 
Inference on Python 3.12
https://github.com/apache/beam/issues/31040 [Bug]: ReadAllFiles does not fully 
read gzipped files from GCS
https://github.com/apache/beam/issues/30757 [Bug]: Beam Playground scio 
examples cannot run
https://github.com/apache/beam/issues/30737 [Failing Test]: Playground 
PreCommit failing goLint
https://github.com/apache/beam/issues/30644 The Inference Python Benchmarks 
Dataflow job is flaky
https://github.com/apache/beam/issues/30612 The Playground CI Nightly job is 
flaky
https://github.com/apache/beam/issues/30606 The PostCommit Java Nexmark 
Dataflow job is flaky
https://github.com/apache/beam/issues/30529 The PostCommit Java Sickbay job is 
flaky
https://github.com/apache/beam/issues/30527 The PostCommit Java IO Performance 
Tests job is flaky
https://github.com/apache/beam/issues/30526 The PerformanceTests xlang KafkaIO 
Python job is flaky
https://github.com/apache/beam/issues/30525 The PostCommit Python 
ValidatesContainer Dataflow With RC job is flaky
https://github.com/apache/beam/issues/30521 The LoadTests Go Combine Flink 
Batch job is flaky
https://github.com/apache/beam/issues/30520 The LoadTests Python Combine Flink 
Streaming job is flaky
https://github.com/apache/beam/issues/30519 The PostCommit XVR GoUsingJava 
Dataflow job is flaky
https://github.com/apache/beam/issues/30517 The PostCommit XVR Direct job is 
flaky
https://github.com/apache/beam/issues/30513 The PostCommit Python job is flaky
https://github.com/apache/beam/issues/30507 The LoadTests Go GBK Flink Batch 
job is flaky
https://github.com/apache/beam/issues/30506 The TypeScript Tests job is flaky
https://github.com/apache/beam/issues/30503 The PostCommit Java ValidatesRunner 
Flink Java11 job is flaky
https://github.com/apache/beam/issues/30502 The LoadTests Go CoGBK Flink Batch 
job is flaky
https://github.com/apache/beam/issues/29971 [Bug]: FixedWindows not working for 
large Kafka topic
https://github.com/apache/beam/issues/29926 [Bug]: FileIO: lack of timeouts may 
cause the pipeline to get stuck indefinitely
https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness 
doesn't update user counters in OnTimer callback functions
https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader 
provided by apache beam does not pick the event time for watermarking
https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState