Re: PyFlink MapState with Types.ROW() throws exception

2023-10-05 Thread Elkhan Dadashov
() .build() ) import os os.environ["PYFLINK_GATEWAY_DISABLED"] = "0" On Wed, Oct 4, 2023 at 1:48 PM Elkhan Dadashov wrote: > Hi Flinkers, > > I'm trying to use MapState, where the value will be a list of 'pyflink.common.types.Row'> ty

PyFlink MapState with Types.ROW() throws exception

2023-10-04 Thread Elkhan Dadashov
Hi Flinkers, I'm trying to use MapState, where the value will be a list of type elements. Wanted to check if anyone else faced the same issue while trying to use MapState in PyFlink with complex types. Here is the code: from pyflink.common import Time from pyflink.common.typeinfo import Types

Metrics are not reported in Python UDF (used inside FlinkSQL) when exception is raised

2023-07-27 Thread Elkhan Dadashov
Hi Flinkers, Wanted to check if anyone else has faced this issue before: When Python UDF (which is used inside FlinkSQL) raises an exception, then metrics get lost and not reported. Facing this issue both in Flink 1.16.2 and FLink 1.17.1 (Python 3.9). If an exception is not raised, then metrics

Questions regarding connecting local FlinkSQL client to remote JobManager in K8s

2022-03-02 Thread Elkhan Dadashov
Hi Flink users, Wanted to check if any of you tried to run the local FlinkSQL client against JobManager running in the Kubernetes environment. For local FlinkSQL Client and local Flink cluster we set these params: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 To make it work, Is

How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

2020-04-16 Thread Elkhan Dadashov
Hi Flink users, I have a basic Flnk pipeline, doing flatmap. inside flatmap, I get the input, path it to the client library to compute some result. That library execution takes around 30 seconds to 2 minutes (depending on the input ) for producing the output from the given input ( it is

ValueState with pure Java class keeping lists/map vs ListState/MapState, which one is a recommended way?

2020-01-17 Thread Elkhan Dadashov
Hi Flinkers, Was curious about if there is any performance(memory/speed) difference between these two options: in window process functions, when keeping state: *1) Create a single ValueState, and store state in pure Java objects* class MyClass { List listOtherClass; Map

Re: How to handle Flink Job with 400MB+ Uberjar with 800+ containers ?

2019-08-30 Thread Elkhan Dadashov
tainers. We'd like to contribute it to the >>> community very soon. >>> >>> 3. We deploys a timeout timer for each launching container. If a task >>> manager does not register in time after its container has been launched, a >>> new container will be alloc

How to handle Flink Job with 400MB+ Uberjar with 800+ containers ?

2019-08-29 Thread Elkhan Dadashov
Dear Flink developers, Having difficulty of getting a Flink job started. The job's uberjar/fat jar is around 400MB, and I need to kick 800+ containers. The default HDFS replication is 3. *The Yarn queue is empty, and 800 containers are allocated almost immediately by Yarn RM.* It takes

Will StreamingFileSink.forBulkFormat(...) support overriding OnCheckpointRollingPolicy?

2019-07-22 Thread Elkhan Dadashov
Hi folks, Will StreamingFileSink.forBulkFormat(...) support overriding OnCheckpointRollingPolicy? Does anyone use StreamingFileSink *with checkpoint disabled *for writing Parquet output files? The output parquet files are generated, but they are empty, and stay in *inprogress* state, even when

Any tutorial/example/blogpost/doc for Hive Source and Hive Sink with Flink streaming job?

2019-06-26 Thread Elkhan Dadashov
Hey Flink community, Just getting started with Flink. Wanted to ask if there is any tutorial/example/blogpost/doc for Hive Source and Hive Sink with Flink streaming job? Thanks.