Re: Flink job performance

2024-04-15 Thread Oscar Perez via user
almost full, could you try allocating more CPUs and see if the > instability persists? > > Best, > Zhanghao Chen > -- > *From:* Oscar Perez > *Sent:* Monday, April 15, 2024 19:24 > *To:* Zhanghao Chen > *Cc:* Oscar Perez via user > *Subje

Impact on using clean code and serializing everything

2024-04-04 Thread Oscar Perez via user
Hi, We would like to adhere to clean code and expose all dependencies in the constructor of the process functions In flink, however, all dependencies passed to process functions must be serializable. Another workaround is to instantiate these dependencies in the open method of the process

How to list operators and see UID

2024-04-03 Thread Oscar Perez via user
Hei, We are facing an issue with one of the jobs in production where fails to map state from one deployment to another. I guess the problem is that we failed to set a UID and relies on the default of providing one based on hash Is it possible to see all operators / UIDs at a glance? What is the

Sending key with the event

2024-01-23 Thread Oscar Perez via user
Hi flink experts! I have a question regarding apache flink. We want to send an event to a certain topic but for some reason we fail to send a proper key with the event. The event is published properly in the topic but the key for this event is null. I only see the method out.collect(event) to

Feature flag functionality on flink

2023-12-07 Thread Oscar Perez via user
Hi, We would like to enable sort of a feature flag functionality for flink jobs. The idea would be to use broadcast state reading from a configuration topic and then ALL operators with logic would listen to this state. This documentation:

Advice on checkpoint interval best practices

2023-12-05 Thread Oscar Perez via user
Hei, We are tuning some of the flink jobs we have in production and we would like to know what are the best numbers/considerations for checkpoint interval. We have set a default of 30 seconds for checkpoint interval and the checkpoint operation takes around 2 seconds. We have also enabled

Metrics not available

2023-11-27 Thread Oscar Perez via user
Hi, We are using flink 1.16 and we woud like to monitor the state metrics of a certain job. Looking at the documentation I see that there are some state access latencies: https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/metrics/ Namely, I would like to access the following:

Checkpoint RMM

2023-11-27 Thread Oscar Perez via user
Hi, We have a long running job in production and we are trying to understand the metrics for this job, see attached screenshot. We have enabled incremental checkpoint for this job and we use RocksDB as a state backend. When deployed from fresh state, the initial checkpoint size is about*

Re: Operator ids

2023-11-25 Thread Oscar Perez via user
You, unfortunately, just cant AFAIK On Sat, 25 Nov 2023 at 14:45, rania duni wrote: > Hello! > > I would like to know how can I get the operator ids of a running job. I > know how can I get the task ids but I want the operator ids! I couldn’t > find something to the REST API docs. > Thank you.

Doubts about state and table API

2023-11-24 Thread Oscar Perez via user
Hi, We are having a job in production where we use table API to join multiple topics. The query looks like this: SELECT * FROM topic1 AS t1 JOIN topic2 AS t2 ON t1.userId = t2.userId JOIN topic3 AS t3 ON t1.userId = t3.accountUserId This works and produces an EnrichedActivity any time any of

Profiling on flink jobs

2023-11-09 Thread Oscar Perez via user
hi [image: :wave:] I am trying to do profiling on one of our flink jobs according to these docs: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/application_profiling/We are using OpenJDK 8.0. I am adding this line to the flink properties file in docker-compose:

e2e tests with flink

2023-09-11 Thread Oscar Perez via user
Hi, I have a flink job which I want to test e2e. In the test I start flink minicluster and this reads from kafka topics in testcontainers. I m facing a problem that for some topics I have starting offset as latest and I want to publish these messages just after the job has been completely

Re: Problem when testing table API in local

2023-09-08 Thread Oscar Perez via user
gt; You would need to add the flink-clients module when running in local mode. >> The *flink-clients* dependency is only necessary to invoke the Flink >> program locally (for example to run it standalone for testing and >> debugging). >> >> Best regards, >> Alexey

Re: Problem when testing table API in local

2023-09-08 Thread Oscar Perez via user
gt; The *flink-clients* dependency is only necessary to invoke the Flink > program locally (for example to run it standalone for testing and > debugging). > > Best regards, > Alexey > > On Fri, Sep 8, 2023 at 3:17 PM Oscar Perez via user > wrote: > >> Hei flink commun

Problem when testing table API in local

2023-09-08 Thread Oscar Perez via user
Hei flink community, We are facing an issue with flink 1.15, 1.16 or 1.16.2 (I tried these 3 versions with same results, maybe it is more general) I am trying to test table API in local and for that I have the following dependencies in my job. See the list of dependencies at the bottom of this

Access to collector in the process function

2023-08-30 Thread Oscar Perez via user
Hi! We would like to use hexagonal architecture in our design and treat the collector as an output port when sending events from the use case. For that, we would like to call an interface from the use case that effectively sends the event ultimately via out.collect The problem is that for

Dependency injection framework for flink

2023-08-01 Thread Oscar Perez via user
Hi, we are currently migrating some of our jobs into hexagonal architecture and I have seen that we can use spring as dependency injection framework, see: https://getindata.com/blog/writing-flink-jobs-using-spring-dependency-injection-framework/ Has anybody analyzed different JVM DI frameworks

Re: Using HybridSource

2023-07-05 Thread Oscar Perez via user
to them before combining them? >> >> On Tue, Jul 4, 2023, 23:53 Ken Krugler >> wrote: >> >>> Hi Oscar, >>> >>> Couldn’t you have both the Kafka and File sources return an Either>> from CSV File, Protobuf from Kafka>, and then (after the

Re: Using HybridSource

2023-07-04 Thread Oscar Perez via user
needed for lookup from the > "main" stream? > 2. Which API are you using? DataStream/SQL/Table or low level > ProcessFunction? > > Best, > Alex > > > On Tue, 4 Jul 2023 at 11:14, Oscar Perez via user > wrote: > >> ok, but is it? As I said, both so

Re: Difference between different values for starting offset

2023-07-04 Thread Oscar Perez via user
set in Flink state, > there is no difference between the two strategies. This is because the > `auto.offset.reset` maps to the `OffsetResetStrategy` and > OffsetInitializer.earliest uses `earliest` too. > > Best, > Mason > > On Mon, Jul 3, 2023 at 6:56 AM Oscar Perez via user > wrote: >

Re: Using HybridSource

2023-07-04 Thread Oscar Perez via user
; > I think HybridSource is the right solution. > > Best regards, > Alexey > > On Mon, Jul 3, 2023 at 3:44 PM Oscar Perez via user > wrote: > >> Hei, We want to bootstrap some data from a CSV file before reading from a >> kafka topic that has a retention per

Difference between different values for starting offset

2023-07-03 Thread Oscar Perez via user
Hei, Looking at the flink documentation for kafkasource I see the following values for starting offset: OffsetInitializer.earliest OffsetInitializer.latest OffsetInitializer.commitedOffset(OffsetResetStrategy.EARLIEST) >From what I understand OffsetInitializer.earliest uses earliest offset the

Using HybridSource

2023-07-03 Thread Oscar Perez via user
Hei, We want to bootstrap some data from a CSV file before reading from a kafka topic that has a retention period of 7 days. We believe the best tool for that would be the HybridSource but the problem we are facing is that both datasources are of different nature. The KafkaSource returns a