Statefun embedded functions - parallel per partition, sequential per key

2021-10-26 Thread Filip Karnicki
Hi I have a kafka topic with json messages that I map to protobufs within a data stream, and then send those to embedded stateful functions using the datastream integration api (DataStream[RoutableMessage]). From there I need to make an idempotent long-running blocking IO call. I noticed that I w

Re: s3 access denied with flink-s3-fs-presto

2021-10-26 Thread Parag Somani
Hello, I have successfully been able to store data on S3 bucket. Earlier, I used to have a similar issue. What you need to confirm: 1. S3 bucket is created with RW access(irrespective if it is minio or AWS S3) 2. "flink/opt/flink-s3-fs-presto-1.14.0.jar" jar is copied to plugin directory of "flink

Re: Application mode - Custom Flink docker image with Python user code

2021-10-26 Thread Dian Fu
Hi Sumeet, It still has not provided special support to handle the dependencies for the Application mode in PyFlink. This means that the dependencies could be handled the same as the other deployment modes. However, it is indeed correct that the dependencies could be handled differently in applica

Re: Application mode - Custom Flink docker image with Python user code

2021-10-26 Thread Shuiqiang Chen
Hi Sumeet, Actually, running pyflink jobs in application mode on kubernetes has been supported since release 1.13. To build a docker image with PyFlink installed, please refer to Enabling Python[1]. In order to run the python code in application mode, you also need to COPY the code files into the

Re: FlinkKafkaProducer deprecated in 1.14 but pyflink binding still present?

2021-10-26 Thread Dian Fu
Hi Francis, Yes, you are right. It's still not updated in PyFlink as KafkaSource/KafkaSink are still not supported in PyFlink. Hopeful we could add that support in 1.15 and then we could deprecate/remove the legacy interfaces. Regards, Dian On Tue, Oct 26, 2021 at 12:53 PM Francis Conroy < franc

Re: How to refresh topics to ingest with KafkaSource?

2021-10-26 Thread Mason Chen
Hi all, I have a similar requirement to Preston. I created https://issues.apache.org/jira/browse/FLINK-24660 to track this effort. Best, Mason > On Oct 18, 2021, at 1:59 AM, Arvid Heise wrote: > > Hi Preston, > > if you still need to set K

FlinkKafkaConsumer -> KafkaSource State Migration

2021-10-26 Thread Mason Chen
Hi all, I read these instructions for migrating to the KafkaSource: https://nightlies.apache.org/flink/flink-docs-release-1.14/release-notes/flink-1.14/#deprecate-flinkkafkaconsumer . Do we need to employ any uid/allowNonRestoredState tricks if our Flink job is also stateful outside of the source

Re: s3 access denied with flink-s3-fs-presto

2021-10-26 Thread Vamshi G
s3a with hadoop s3 filesystem works fine for us wit sts assume role credentials and with kms. Below are how our hadoop s3a configs look like. Since the endpoint is globally whitelisted, we don't explicitly mention the endpoint. fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.auth.Assumed

Re: Flink handle both kafka source and db source

2021-10-26 Thread Rafi Aroch
Hi, Take a look at the new 1.14 feature called Hybrid Source: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/ Rafi On Tue, Oct 26, 2021 at 7:46 PM Qihua Yang wrote: > Hi, > > My flink app has two data sources. One is from a Kafka topic, one

Re: SplitEnumeratorContext callAsync() cleanup

2021-10-26 Thread Mason Chen
Hi Fabian, Unfortunately, I don't have the log since I was just testing it out on my local setup. I can try to reproduce it later in the week. Best, Mason On Mon, Oct 25, 2021 at 8:09 AM Fabian Paul wrote: > Hi Mason, > > Thanks for opening the ticket. Can you also share the log with us when t

Flink handle both kafka source and db source

2021-10-26 Thread Qihua Yang
Hi, My flink app has two data sources. One is from a Kafka topic, one is from a database by using the JDBC connector. Flink scan the full database table. Which mode should we use? batch mode or streaming mode? How do we know the database table is fully scanned? Will Flink throw any signal to show

Re: Flink support for Kafka versions

2021-10-26 Thread Prasanna kumar
Hi , We are using Kafka broker version 2.4.1.1. Also kafka client 2.4.1.1 jar which is part of flink kafka connector recently was marked with high security issue. So we excluded the dependency and overriden it with kafka client 2.8.1 client jar and it works fine with the 2.4.1.1 broker. ( since it

Re: Flink JDBC connect with secret

2021-10-26 Thread Qihua Yang
Hi Jing, Thank you for your suggestion. I will check if SSL parameters in URL works. Thanks, Qihua On Sat, Oct 23, 2021 at 8:37 PM JING ZHANG wrote: > Hi Qihua, > I checked user documents of several database vendors(postgres, oracle, > solidDB,SQL server)[1][2][3][4][5], and studied how to us

RE: Huge backpressure when using AggregateFunction with Session Window

2021-10-26 Thread Schwalbe Matthias
Hi Ori, … answering from remote … * If not completely mistaken, Scala Vector is immutable, creating a copy whenever you append, but * This is not the main problem, the vectors collected so far get deserialized with every incoming event (from state storage) and afterward serialized into

Re: Troubleshooting checkpoint timeout

2021-10-26 Thread Piotr Nowojski
I'm glad that I could help :) Piotrek pon., 25 paź 2021 o 16:04 Alexis Sarda-Espinosa < alexis.sarda-espin...@microfocus.com> napisał(a): > Oh, I got it. I should’ve made the connection earlier after you said “Once > an operator decides to send/broadcast a checkpoint barrier downstream, it > jus

RE: Using POJOs with the table API

2021-10-26 Thread Alexis Sarda-Espinosa
Hello, I've found a ticket that talks about very high-level improvements to the Table API [1]. Are there any more concrete pointers for migration from DataSet to Table API? Will it be possible at all to use POJOs with the Table API? [1] https://issues.apache.org/jira/browse/FLINK-20787 Regards

Re: Not cleanup Kubernetes Configmaps after execution success

2021-10-26 Thread Roman Khachatryan
Thanks for sharing this, The sequence of events the log seems strange to me: 2021-10-17 03:05:55,801 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Close ResourceManager connection c1092812cfb2853a5576ffd78e346189: Stopping JobMaster for job 'rt-match_12.4.5_8d48b21a' (

Re: Async Performance

2021-10-26 Thread Arvid Heise
Hi Sanket, if you have a queue of 1000, then 1000 will be used in AsyncIO. Memory doesn't matter. What you need to double-check is if your async library can handle that many elements in parallel. The AsyncHttpClient should have a thread pool that effectively will put an upper limit on how many el

Re: Fwd: How to deserialize Avro enum type in Flink SQL?

2021-10-26 Thread Peter Schrott
Hi people, I found a workaround for that issue - which works at least for my use case. The main idea was customizing "org.apache.flink.formats.avro.registry.confluent.RegistryAvroFormatFactory" such that the expected avro schema is not gained from the CREATE TABLE SQL statement, rather than pa

Application mode - Custom Flink docker image with Python user code

2021-10-26 Thread Sumeet Malhotra
Hi, I'm currently submitting my Python user code from my local machine to a Flink cluster running in Session mode on Kubernetes. For this, I have a custom Flink image with Python as per this reference [1]. Now, I'd like to move to using the Application mode with Native Kubernetes, where the user