Re: Flink job performance

2024-04-15 Thread Oscar Perez via user
almost full, could you try allocating more CPUs and see if the > instability persists? > > Best, > Zhanghao Chen > -- > *From:* Oscar Perez > *Sent:* Monday, April 15, 2024 19:24 > *To:* Zhanghao Chen > *Cc:* Oscar Perez via user > *Subject:*

Impact on using clean code and serializing everything

2024-04-04 Thread Oscar Perez via user
Hi, We would like to adhere to clean code and expose all dependencies in the constructor of the process functions In flink, however, all dependencies passed to process functions must be serializable. Another workaround is to instantiate these dependencies in the open method of the process functio

How to list operators and see UID

2024-04-03 Thread Oscar Perez via user
Hei, We are facing an issue with one of the jobs in production where fails to map state from one deployment to another. I guess the problem is that we failed to set a UID and relies on the default of providing one based on hash Is it possible to see all operators / UIDs at a glance? What is the b

Sending key with the event

2024-01-23 Thread Oscar Perez via user
Hi flink experts! I have a question regarding apache flink. We want to send an event to a certain topic but for some reason we fail to send a proper key with the event. The event is published properly in the topic but the key for this event is null. I only see the method out.collect(event) to pu

Feature flag functionality on flink

2023-12-07 Thread Oscar Perez via user
Hi, We would like to enable sort of a feature flag functionality for flink jobs. The idea would be to use broadcast state reading from a configuration topic and then ALL operators with logic would listen to this state. This documentation: https://nightlies.apache.org/flink/flink-docs-release-1.1

Advice on checkpoint interval best practices

2023-12-05 Thread Oscar Perez via user
Hei, We are tuning some of the flink jobs we have in production and we would like to know what are the best numbers/considerations for checkpoint interval. We have set a default of 30 seconds for checkpoint interval and the checkpoint operation takes around 2 seconds. We have also enabled incremen

Metrics not available

2023-11-27 Thread Oscar Perez via user
Hi, We are using flink 1.16 and we woud like to monitor the state metrics of a certain job. Looking at the documentation I see that there are some state access latencies: https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/metrics/ Namely, I would like to access the following: *

Checkpoint RMM

2023-11-27 Thread Oscar Perez via user
Hi, We have a long running job in production and we are trying to understand the metrics for this job, see attached screenshot. We have enabled incremental checkpoint for this job and we use RocksDB as a state backend. When deployed from fresh state, the initial checkpoint size is about* 2.41G*.

Re: Operator ids

2023-11-25 Thread Oscar Perez via user
You, unfortunately, just cant AFAIK On Sat, 25 Nov 2023 at 14:45, rania duni wrote: > Hello! > > I would like to know how can I get the operator ids of a running job. I > know how can I get the task ids but I want the operator ids! I couldn’t > find something to the REST API docs. > Thank you.

Doubts about state and table API

2023-11-24 Thread Oscar Perez via user
Hi, We are having a job in production where we use table API to join multiple topics. The query looks like this: SELECT * FROM topic1 AS t1 JOIN topic2 AS t2 ON t1.userId = t2.userId JOIN topic3 AS t3 ON t1.userId = t3.accountUserId This works and produces an EnrichedActivity any time any of t

Profiling on flink jobs

2023-11-09 Thread Oscar Perez via user
hi [image: :wave:] I am trying to do profiling on one of our flink jobs according to these docs: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/application_profiling/We are using OpenJDK 8.0. I am adding this line to the flink properties file in docker-compose: env.java.o

Python unit test cases approach for PyFlink 1.17.1

2023-10-05 Thread Perez
Hello Experts, I have developed a couple of modules where one of the modules is getting data from Kafka, applying the process window function, and getting the required form of data. However, I went through the official documentation and there are no approaches for implementing the unit test cases

Re: Pyflink unittest cases

2023-10-04 Thread Perez
. On Mon, Oct 2, 2023 at 9:21 PM joshua perez wrote: > Hello folks, > > Any help is appreciated. > > J. > > On Sat, Sep 30, 2023 at 1:47 PM joshua perez wrote: > >> Hi Team, >> >> We recently have started a use case where there would be involvement o

Difference between different timestamps

2023-10-04 Thread Perez
hello team, I want to understand if context.timestamp() of ProcessFunction is the same as context.current_processing_time()/ window().start of ProcessWindowFunction? Thanks.

Unable to implement custom tuple deserialiser

2023-10-04 Thread Perez
Hi Team, I am trying to implement the custom deserializer class for tuple type. I have explained my problem here on this link . Any help i

Re: Watermarks

2023-10-02 Thread Perez
mean something else? Thanks. On Mon, Oct 2, 2023 at 4:32 PM Perez wrote: > Hi Liu and Jinfeng, > > I am trying to implement KafkaDeserializationSchema for Pyflink but am > unable to get any examples. Can you share some links or references using > which I can understand and try

Re: Unable to read records from kafka

2023-10-02 Thread joshua perez
Hi Team, you can ignore this thread. I was able to resolve this. J. On Mon, Oct 2, 2023 at 8:40 PM joshua perez wrote: > Hi team, > > I am trying to read the records from the Kafka topic and below is my very > basic code as of now > > from pyflink.datastream.conne

Unable to read records from kafka

2023-10-02 Thread joshua perez
Hi team, I am trying to read the records from the Kafka topic and below is my very basic code as of now from pyflink.datastream.connectors.kafka import FlinkKafkaConsumer from pyflink.datastream.stream_execution_environment import StreamExecutionEnvironment, RuntimeExecutionMode from pyflink.comm

Re: Pyflink unittest cases

2023-10-02 Thread joshua perez
Hello folks, Any help is appreciated. J. On Sat, Sep 30, 2023 at 1:47 PM joshua perez wrote: > Hi Team, > > We recently have started a use case where there would be involvement of > Kafka and Flink's low level APIs like map and process functions and since I > am entirely

Re: Watermarks

2023-10-02 Thread Perez
Hi Liu and Jinfeng, I am trying to implement KafkaDeserializationSchema for Pyflink but am unable to get any examples. Can you share some links or references using which I can understand and try to implement myself? Perez sid.

Pyflink unittest cases

2023-09-30 Thread joshua perez
Hi Team, We recently have started a use case where there would be involvement of Kafka and Flink's low level APIs like map and process functions and since I am entirely new to these stuffs, I couldn't get the exact approach/way to test Kafka connector and low level APIs. So can anyone share any w

Re: Watermarks

2023-09-13 Thread Perez
Cool thanks for the clarification. Sid. On Mon, Sep 11, 2023 at 9:22 AM liu ron wrote: > Hi, Sid > > For the second question, I think it is not needed. > > Best, > Ron > > Feng Jin 于2023年9月9日周六 21:19写道: > >> hi Sid >> >> >> 1. You can customize KafkaDeserializationSchema[1], in the `deserializ

e2e tests with flink

2023-09-11 Thread Oscar Perez via user
Hi, I have a flink job which I want to test e2e. In the test I start flink minicluster and this reads from kafka topics in testcontainers. I m facing a problem that for some topics I have starting offset as latest and I want to publish these messages just after the job has been completely started

Re: Problem when testing table API in local

2023-09-08 Thread Oscar Perez via user
2023 at 16:38, Oscar Perez wrote: > Hei, > Tried adding flink-cients like this: > still same error :( > > implementation "org.apache.flink:flink-clients:${flinkVersion}" > > > On Fri, 8 Sept 2023 at 16:30, Alexey Novakov wrote: > >> Hi, >> >&

Re: Problem when testing table API in local

2023-09-08 Thread Oscar Perez via user
gt; The *flink-clients* dependency is only necessary to invoke the Flink > program locally (for example to run it standalone for testing and > debugging). > > Best regards, > Alexey > > On Fri, Sep 8, 2023 at 3:17 PM Oscar Perez via user > wrote: > >> Hei flink community

Problem when testing table API in local

2023-09-08 Thread Oscar Perez via user
Hei flink community, We are facing an issue with flink 1.15, 1.16 or 1.16.2 (I tried these 3 versions with same results, maybe it is more general) I am trying to test table API in local and for that I have the following dependencies in my job. See the list of dependencies at the bottom of this em

Access to collector in the process function

2023-08-30 Thread Oscar Perez via user
Hi! We would like to use hexagonal architecture in our design and treat the collector as an output port when sending events from the use case. For that, we would like to call an interface from the use case that effectively sends the event ultimately via out.collect The problem is that for instant

Dependency injection framework for flink

2023-08-01 Thread Oscar Perez via user
Hi, we are currently migrating some of our jobs into hexagonal architecture and I have seen that we can use spring as dependency injection framework, see: https://getindata.com/blog/writing-flink-jobs-using-spring-dependency-injection-framework/ Has anybody analyzed different JVM DI frameworks e.

Re: Using HybridSource

2023-07-05 Thread Oscar Perez via user
to them before combining them? >> >> On Tue, Jul 4, 2023, 23:53 Ken Krugler >> wrote: >> >>> Hi Oscar, >>> >>> Couldn’t you have both the Kafka and File sources return an Either>> from CSV File, Protobuf from Kafka>, and then (after the

Re: Using HybridSource

2023-07-04 Thread Oscar Perez via user
ed for lookup from the > "main" stream? > 2. Which API are you using? DataStream/SQL/Table or low level > ProcessFunction? > > Best, > Alex > > > On Tue, 4 Jul 2023 at 11:14, Oscar Perez via user > wrote: > >> ok, but is it? As I said, both so

Re: Difference between different values for starting offset

2023-07-04 Thread Oscar Perez via user
ate, > there is no difference between the two strategies. This is because the > `auto.offset.reset` maps to the `OffsetResetStrategy` and > OffsetInitializer.earliest uses `earliest` too. > > Best, > Mason > > On Mon, Jul 3, 2023 at 6:56 AM Oscar Perez via user > wrote: > >

Re: Using HybridSource

2023-07-04 Thread Oscar Perez via user
; > I think HybridSource is the right solution. > > Best regards, > Alexey > > On Mon, Jul 3, 2023 at 3:44 PM Oscar Perez via user > wrote: > >> Hei, We want to bootstrap some data from a CSV file before reading from a >> kafka topic that has a retention period of

Difference between different values for starting offset

2023-07-03 Thread Oscar Perez via user
Hei, Looking at the flink documentation for kafkasource I see the following values for starting offset: OffsetInitializer.earliest OffsetInitializer.latest OffsetInitializer.commitedOffset(OffsetResetStrategy.EARLIEST) >From what I understand OffsetInitializer.earliest uses earliest offset the f

Using HybridSource

2023-07-03 Thread Oscar Perez via user
Hei, We want to bootstrap some data from a CSV file before reading from a kafka topic that has a retention period of 7 days. We believe the best tool for that would be the HybridSource but the problem we are facing is that both datasources are of different nature. The KafkaSource returns a protobu