Re: Redis as a State Backend

2024-02-14 Thread David Morávek
Here is my "little harsh/straightforward feedback", but it's based on fact and real-world experience with using Redis since ~2012. Redis is not a database, period. The best description of what Redis is is something along the lines of "in-memory - text only (base64 ftw) - data structures on top of

Re: Elastic Block Store as checkpoint storage

2023-07-20 Thread David Morávek
Using EBS as checkpoint storage doesn't work in a distributed environment if you need to move the state between TMs (e.g., for rescaling and non-local recovery). You'd need something along the lines of RW multi-attach and set up the volumes in a smart way; it won't be easy to set up; I'm not aware

Re: The JobManager is taking minutes to complete and finalize checkpoints despite the Task Managers seem to complete them in a few seconds

2023-05-02 Thread David Morávek
Hi Francesco, Finalization also includes the removal of older checkpoints (we're only keeping the last N checkpoints), which could be pretty costly in the case of RocksDB (many small files). Can you check how long the removal of old checkpoint files from S3 takes (there might be some rate

Re: Task Failure Strategy for Adaptive Scheduler

2023-04-18 Thread David Morávek
ing jobs ? If it can, local recovery makes sense at that time I > believe. > > Thanks > > > On Wed, Apr 12, 2023 at 2:15 AM David Morávek > wrote: > >> Hi Talat, >> >> For most streaming pipelines, we have to restart the whole pipeline no >&g

Re: [Discussion] - Take findify/flink-scala-api under Flink umbrella

2023-04-16 Thread David Morávek
cc dev@f.a.o On Sun, Apr 16, 2023 at 11:42 AM David Morávek wrote: > Hi Alexey, > > I'm a bit skeptical because, looking at the project, I see a couple of red > flags: > > - The project is inactive. The last release and commit are both from the > last May. > - The proj

Re: [Discussion] - Take findify/flink-scala-api under Flink umbrella

2023-04-16 Thread David Morávek
Hi Alexey, I'm a bit skeptical because, looking at the project, I see a couple of red flags: - The project is inactive. The last release and commit are both from the last May. - The project has not been adapted for the last two Flink versions, which signals a lack of users. - All commits are by

Re: Avoiding data shuffling when reading pre-partitioned data from Kafka

2023-03-07 Thread David Morávek
rialization-tuning-vol.-1-choosing-your-serializer-if-you-can/ > > > > > > > > *From:* Tommy May > *Sent:* Tuesday, March 7, 2023 3:25 AM > *To:* David Morávek > *Cc:* Ken Krugler ; Flink User List < > user@flink.apache.org> > *Subject:* Re: Avoiding data s

Re: Avoiding data shuffling when reading pre-partitioned data from Kafka

2023-03-06 Thread David Morávek
Using an operator state for a stateful join isn't great because it's meant to hold only a minimal state related to the operator (e.g., partition tracking). If your data are already pre-partitioned and the partitioning matches (hash partitioning on the JAVA representation of the key yielded by the

Re: Flink UI in Application Mode

2022-05-23 Thread David Morávek
Hi Zain, you can find a link to web-ui either in the CLI output after the job submission or in the YARN ResourceManager web ui [1]. With YARN Flink needs to choose the application master port at random (could be somehow controlled by setting _yarn.application-master.port_) as there might be

Re: Reactive mode and checkpointing

2022-04-12 Thread David Morávek
Hi Aryan, this is an interesting thought. What kind of option do you have in mind? My take on this is that if checkpoint times out, it's pretty likely that the next one will timeout as well and the scheduler has no way of knowing that the next one would succeed. Also up-scaling might help to

Re: Wrong format when passing arguments with space

2022-03-29 Thread David Morávek
cc Kevin On Tue, Mar 29, 2022 at 9:15 AM David Morávek wrote: > Hi Kevin, > > -dev@f.a.o +user@f.a.o > > Thanks for the report! I've run some experiments and unfortunately I'm not > able to reproduce the behavior you're describing. The bash "$@" expansion > se

Re: Wrong format when passing arguments with space

2022-03-29 Thread David Morávek
Hi Kevin, -dev@f.a.o +user@f.a.o Thanks for the report! I've run some experiments and unfortunately I'm not able to reproduce the behavior you're describing. The bash "$@" expansion seems to work as expected (always receiving correctly expanded unquoted strings in the main class). Can you maybe

Re: flink cluster startup time

2022-03-28 Thread David Morávek
Hi Frank, I'm not really familiar with the internal workings of the Spotify's operator, but here are few general notes: - You only need the JM process for the REST API to become available (TMs can join in asynchronously). I'd personally aim for < 1m for this step, if it takes longer it could

Re: How to proper hashCode() for keys.

2022-02-07 Thread David Morávek
> > The key selector works. No it does not ;) It depends on the system time so it's not deterministic (you can get different keys for the very same element). How do you key a count based on the time. I have taken this from samples > online. > This is what the windowing is for. You basically

Re: Uploading jar to s3 for persistence

2022-01-10 Thread David Morávek
I understand the issue. We currently don't have a good mechanism for this kind of external file management (we need to avoid leaking resources) :( Even right now, we kind of rely on upload directory being cleaned up by the cluster manager (yarn, k8s), because it's tied with a container lifecycle.

Re: Uploading jar to s3 for persistence

2022-01-10 Thread David Morávek
Hi Puneet, this is a known limitation and unfortunately `web.upload.dir` currently works only with the local system :( There are multiple issues covering this already, I guess FLINK-16544 [1] summarizes the current state well. This is something we want to address with the future releases. We've

Re: CVE-2021-44228 - Log4j2 vulnerability

2022-01-09 Thread David Morávek
you please let us know what will > be the regular release timing for 1.12.8 version. > > > > Regards, > > Suchithra > > > > *From:* David Morávek > *Sent:* Sunday, January 9, 2022 12:11 AM > *To:* V N, Suchithra (Nokia - IN/Bangalore) > *Cc:* Chesnay Schepler ; Mar

Re: CVE-2021-44228 - Log4j2 vulnerability

2022-01-08 Thread David Morávek
anks, > > Suchithra > > > > *From:* Martijn Visser > *Sent:* Thursday, January 6, 2022 7:45 PM > *To:* patrick.eif...@sony.com > *Cc:* David Morávek ; swamy.haj...@gmail.com; > subharaj.ma...@gmail.com; V N, Suchithra (Nokia - IN/Bangalore) < > suchithra@no

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-08 Thread David Morávek
> > What I understand from the docs [1] and David A. answer on SO [2] is, that > I have to create for my POJOs with Lists and Maps TypeInfos. > That is just to avoid falling back to Kryo serialization, which is less effective, but this IMO shouldn't break your application. There might be of

Re: Windowing on the consumer side

2022-01-08 Thread David Morávek
I just have to answer to this. This was again cross posted on stackoverflow [1]. I think you seriously need to rethink your behavior here. The cross posting is one thing, but creating a second "fake" email, so you can repeat the behavior you've been discouraged from [2], makes me feel that you

Re: Trigger the producer to send data to the consumer after mentioned seconds

2022-01-07 Thread David Morávek
y needs to be done? > > Thanks, > Martin O. > > On Sat, Jan 8, 2022 at 12:19 AM David Morávek wrote: > >> Hi Siddhesh, >> >> any JVM based language (Java, Scala, Kotlin) compiles into a byte-code >> that can be executed by the JVM. As the JVM was evolving over

Re: Avro BulkFormat for the new FileSource API?

2022-01-07 Thread David Morávek
Hi Kevin, I'm not as familiar with initiatives around the new sources, but it seems that the BulkFormat for Avro [1] has been added recently and will be released with the Flink 1.15.x. [1] https://issues.apache.org/jira/browse/FLINK-24565 Best, D. On Fri, Jan 7, 2022 at 7:23 PM Kevin Lam

Re: Trigger the producer to send data to the consumer after mentioned seconds

2022-01-07 Thread David Morávek
Hi Siddhesh, any JVM based language (Java, Scala, Kotlin) compiles into a byte-code that can be executed by the JVM. As the JVM was evolving over the years, new versions of byte code have been introduced. Target version simply refers the the version of bytecode the compiler should generate. How

Re: Exactly Once Semantics

2022-01-07 Thread David Morávek
similar problems". >> >> It's okay if you don't want to help. >> >> Cheers! >> >> Sid >> >> On Fri, Jan 7, 2022 at 8:18 PM David Morávek wrote: >> >>> Hi Siddhesh, >>> >>> can you please focus your questions on

Re: Exactly Once Semantics

2022-01-07 Thread David Morávek
Hi Siddhesh, can you please focus your questions on one channel only? (either SO or the ML) this could lead to unnecessary work duplication (which would be shame, because the community has limited resources) as people answering on SO might not be aware of the ML thread D. On Fri, Jan 7, 2022

Re: [ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-07 Thread David Morávek
Great job! <3 Thanks Dong and Yun for managing the release and big thanks to everyone who has contributed! Best, D. On Fri, Jan 7, 2022 at 2:27 PM Yun Gao wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink ML 2.0.0. > > > > Apache Flink ML provides API

Re: [ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-07 Thread David Morávek
Great job! <3 Thanks Dong and Yun for managing the release and big thanks to everyone who has contributed! Best, D. On Fri, Jan 7, 2022 at 2:27 PM Yun Gao wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink ML 2.0.0. > > > > Apache Flink ML provides API

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread David Morávek
x the issue. > If yes, that would assume - for me - that its a bug with > reinterpretAsKeyedStream and the elastic scaling. > If no, its probably another issue caused by my code, instead of Flink. > > BR > Martin > > David Morávek schrieb am 07.01.2022 10:22 (GMT +01:00): > > W

Re: Metaspace OOM : class loaders not being GC

2022-01-07 Thread David Morávek
Hi David, If I understand the problem correctly, there is really nothing we can do here. Soft references are garbage collected when there is a high memory pressure and the garbage collector needs to free up more memory. The problem here is that the GC doesn't really take high memory pressure on

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread David Morávek
t;68731ce4-5f11-11ec-a5cd-0255ac100047","recordSequenceNumber":5 > > The percentage is then the number of output records which uses a already > given sequence number (for each key1) compared to all output records. > > > Right now I change the flink job so, that instea

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread David Morávek
Hi Martin, _reinterpretAsKeyedStream_ should definitely work with the reactive mode, if it doesn't it's a bug that needs to be fixed > For test use cases (3) and (4) the state of the keyed process function (c) > seems only available for around 50% of the events processed after > scale-in/fail. >

Re: S3 server side encryption using FileSink

2022-01-05 Thread David Morávek
Hi James, I'm not an expert on s3, but in general this should be a matter of configuring the s3 filesystem implementation that Flink is using (that's what ends up writing the actual files to s3). Flink currently comes with the Hadoop & Presto (also kind of Hadoop based) based implementations.

Re: Pod Disruption in Flink Kubernetes Cluster

2022-01-05 Thread David Morávek
Hi Tianyi, this really depends on your kubernetes setup (eg. if autoscaling is enabled, you're using spot / preemtible instances). In general applications that run on Kubernetes needs be resilient to these kind of failures, Flink is no exception. In case of the failure, Flink needs to restart

Re: Flink native k8s integration vs. operator

2022-01-04 Thread David Morávek
Hi Thomas, AFAIK there are no specific plans in this direction with the native integration, but I'd like to share some thoughts on the topic In my understanding there are three major groups of workloads in Flink: 1) Batch workloads 2) Interactive workloads (Both Batch and Streaming; eg. SQL

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2022-01-03 Thread David Morávek
Rohrmann wrote: > If there are no users strongly objecting to dropping Hadoop support for < > 2.8, then I am +1 for this since otherwise we won't gain a lot as Xintong > said. > > Cheers, > Till > > On Wed, Dec 22, 2021 at 10:33 AM David Morávek wrote: > > > Agreed,

Re: TypeInformation | Flink

2021-12-29 Thread David Morávek
-in-scala On Wed, Dec 29, 2021 at 5:59 PM Siddhesh Kalgaonkar < kalgaonkarsiddh...@gmail.com> wrote: > I have modified my question based on Dominik's inputs. Can somebody help > to take it forward? > > Thanks, > Siddhesh > > On Wed, Dec 29, 2021 at 3:32 PM David Morávek wr

Re: Mapstate got wrong UK when restored.

2021-12-29 Thread David Morávek
an not find > back the UK. I think the key of (key, mapstate(UK,UV)) will be implictly > added when write or read from the state by flink. > So, I am still not clear why I get the key but not the UK. > > Yours > Josh > > David Morávek 于2021年12月29日周三 17:32写道: > >> Hi Jo

Re: Flink and Datadog metrics question

2021-12-29 Thread David Morávek
Hi Adrian, for accessing metrics in your UDFs (user defined functions), you need access to the runtime context, unless you're using lower level APIs that directly extend stream operator. There are "rich" versions of most of the higher level functions (map, flatmap, group combine, ...). Can you

Re: Re: Read parquet data from S3 with Flink 1.12

2021-12-29 Thread David Morávek
I've answered in other thread [1]. Please keep the conversation focused there. [1] https://lists.apache.org/thread/7cqqzno3lz75qw9yxprgg45q6voonsbq Best, D. On Tue, Dec 28, 2021 at 4:00 PM Rohan Kumar wrote: > Hi Alexandre, I am also facing the same issue. Please let us know if you > are able

Re: Unable to read S3 data using the filesystem connector

2021-12-29 Thread David Morávek
Hi Rohan, setting this up is currently not really straightforward :( I think we need to improve on this. For supporting the s3 filesystem, you did it right by placing s3 jars into the plugins directory. Please note, that these are loaded in a separate class loader and also contain a shaded

Re: TypeInformation | Flink

2021-12-29 Thread David Morávek
inik, can you help me with the process functions. How can I > use it for my use case? > > Thanks, > Siddhesh > > On Wed, Dec 29, 2021 at 2:50 PM David Morávek wrote: > >> Hi Siddhesh, >> >> it seems that the question is already being answered

Re: Mapstate got wrong UK when restored.

2021-12-29 Thread David Morávek
Hi Josh, it's important bit to understand is that the MapState (or any other keyed state) is scoped per *key* [1]. You can think about it in a way, that for each key you have a separate "map" that backs it. This is the important concept behind distributed stream processing, that allows you to

Re: TypeInformation | Flink

2021-12-29 Thread David Morávek
Hi Siddhesh, it seems that the question is already being answered in the SO thread, so let's keep the discussion focused there. Looking at the original question, I think it's important to understand, that the TypeInformation is not meant to be used for "runtime" matching, but to address the type

Re: Anyone trying to adopt Scotty on the recent Flink versions?

2021-12-29 Thread David Morávek
Hi Dongwon, Scotty's approach to sliding windows seems really interesting ;) Looking at the code [1], it seems to be no longer maintained. It's both compiled and tested against Flink 1.8 so I wouldn't really expect it to be compatible with 1.14.x :( [1]

Re: Remove stackTrace from error response

2021-12-29 Thread David Morávek
Hi Noa, There is currently no way to do this without making changes to the code. Please note that there are also endpoints for explicitly retrieving the exception history of a particular job. Flink REST API is not really meant to be "secure" in a way that you can make it accessible to the public

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-29 Thread David Morávek
wrote: > Hi folks, > > When can we expect the release to be made available to the community? > > On Wed, Dec 22, 2021 at 3:07 PM David Morávek wrote: > >> Hi Debraj, >> >> we're currently not planning another emergency release as this CVE is not >> as cr

Re: Avoiding Dynamic Classloading for User Code

2021-12-23 Thread David Morávek
k 13.1 > -- > *From:* David Morávek > *Sent:* Thursday, December 23, 2021 1:44 PM > *To:* Lior Liviev > *Cc:* user > *Subject:* Re: Avoiding Dynamic Classloading for User Code > > > *CAUTION*: external source > Please try to post the whole stac

Re: Avoiding Dynamic Classloading for User Code

2021-12-23 Thread David Morávek
ng avro 1.10 > -- > *From:* David Morávek > *Sent:* Thursday, December 23, 2021 12:37 PM > *To:* Lior Liviev ; user > *Subject:* Re: Avoiding Dynamic Classloading for User Code > > > *CAUTION*: external source > I guess I'd need more context

Re: Avoiding Dynamic Classloading for User Code

2021-12-23 Thread David Morávek
2. If I have my JAR in the folder AND I load same JAR via REST API, >will I run into problems? (class loading strategy is set to parent-first) > > -- > *From:* David Morávek > *Sent:* Tuesday, December 21, 2021 6:53 PM > *To:* Lior Liviev > *Cc:

Re: Avoiding Dynamic Classloading for User Code

2021-12-22 Thread David Morávek
PI to >load it? >2. If I have my JAR in the folder AND I load same JAR via REST API, >will I run into problems? (class loading strategy is set to parent-first) > > ---------- > *From:* David Morávek > *Sent:* Tuesday, December 21, 2021 6:53 PM > *To:

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-22 Thread David Morávek
Hi Debraj, we're currently not planning another emergency release as this CVE is not as critical for Flink users as the previous one. However, this patch will be included in all upcoming patch & minor releases. The patch release for the 1.14.x branch is already in progress [1] (it may be bit

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2021-12-22 Thread David Morávek
Agreed, if we drop the CI for lower versions, there is actually no point of having safeguards as we can't really test for them. Maybe one more thought (it's more of a feeling), I feel that users running really old Hadoop versions are usually slower to adopt (they most likely use what the current

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread David Morávek
t; anything else? > > Get Outlook for iOS <https://aka.ms/o0ukef> > ---------- > *From:* David Morávek > *Sent:* Tuesday, December 21, 2021 6:39:10 PM > *To:* Lior Liviev > *Cc:* user > *Subject:* Re: Avoiding Dynamic Classloading for Us

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2021-12-21 Thread David Morávek
CC user@f.a.o Is anyone aware of something that blocks us from doing the upgrade? D. On Tue, Dec 21, 2021 at 5:50 PM David Morávek wrote: > Hi Martijn, > > from person experience, most Hadoop users are lagging behind the release > lines by a lot, because upgrading a Ha

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread David Morávek
or iOS <https://aka.ms/o0ukef> > ---------- > *From:* David Morávek > *Sent:* Tuesday, December 21, 2021 6:08:51 PM > *To:* Lior Liviev ; user > *Subject:* Re: Avoiding Dynamic Classloading for User Code > > > *CAUTION*: external source > Please alw

Re: Class loader

2021-12-21 Thread David Morávek
now if > it will resolve my oom metaspace problem > > Get Outlook for iOS <https://aka.ms/o0ukef> > -- > *From:* David Morávek > *Sent:* Tuesday, December 21, 2021 5:59:05 PM > *To:* Lior Liviev ; user > *Subject:* Re: Class loader > >

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread David Morávek
ing the jar in > flink/lib > ------ > *From:* David Morávek > *Sent:* Tuesday, December 21, 2021 5:43 PM > *To:* Lior Liviev > *Cc:* user@flink.apache.org > *Subject:* Re: Avoiding Dynamic Classloading for User Code > > > *CAUTION*: external source > Hi Li

Re: Class loader

2021-12-21 Thread David Morávek
lly no, it's a separate question > -- > *From:* David Morávek > *Sent:* Tuesday, December 21, 2021 5:45 PM > *To:* Lior Liviev > *Cc:* user@flink.apache.org > *Subject:* Re: Class loader > > > *CAUTION*: external source > I assume this is a d

Re: Class loader

2021-12-21 Thread David Morávek
I assume this is a duplicate of the previous thread [1] [1] https://lists.apache.org/thread/16kxytrqycolfwfmr5tv0g6bq9m2wvof Best, D. On Tue, Dec 21, 2021 at 3:53 PM Lior Liviev wrote: > Hello, I wanted to know if I have my user code Jar in Flink, and I'm > running it 3 times, will the class

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread David Morávek
Hi Lior, can you please provide details about the steps (I'm not sure what load jar / execute with the API means)? are you submitting the job using the REST API or Flink CLI? I assume you're using a session cluster. also what is the concern here? do you run into any class-loading related issues?

Re: How to know if Job nodes are registered in cluster?

2021-12-21 Thread David Morávek
Hi John, there is usually no need to run multiple JM, if you're able to start a new one quickly after failure (eg. when you're running on kubernetes). There is always only single active leader and other JMs effectively do nothing besides competing for the leadership. Zookeeper based HA uses the

Re: flink-playground docker/mvn clean install Unknown host repo.maven.apache.org: Name or service not known

2021-12-21 Thread David Morávek
few steps*. We will walk > you through the necessary commands and show how to validate that everything > is running correctly. > > Which is in fact not exactly true > > The compose command failse. > > I think that instructions need to be added to fix this. > > > Regards Ha

Re: flink-playground docker/mvn clean install Unknown host repo.maven.apache.org: Name or service not known

2021-12-21 Thread David Morávek
021-12-21 10:06:39 (4.42 MB/s) - ‘flink-table-api-java-1.13.1.pom’ > saved [5394/5394] > > Only setting /root/.m2/settings.xml correctly helps > And that is what I don't understand as the proxy configuration will not be > the same for everyone. > > Regards Hans-Peter >

Re: flink-playground docker/mvn clean install Unknown host repo.maven.apache.org: Name or service not known

2021-12-21 Thread David Morávek
Hello Hans, it's DNS ;) You need to make sure, that "repo.maven.apache.org" can be resolved from your docker container (you can use tools such as host, dig, nslookup to verify that). This is may be tricky to debug, unless you're familiar with networking. A good place to start might be checking

Re: question on jar compatibility - log4j related

2021-12-19 Thread David Morávek
Hi Eddie, the APIs should be binary compatible across patch releases, so there is no need to re-compile your artifacts Best, D. On Sun 19. 12. 2021 at 16:42, Colletta, Edward wrote: > If have jar files built using flink version 11.2 in dependencies, and I > upgrade my cluster to 11.6, is it

Re: How do I determine which hardware device and software has log4j zero-day security vulnerability?

2021-12-16 Thread David Morávek
Hi Turritopsis, I fail to see any relation to Apache Flink. Can you please elaborate on how Flink fits into it? Best, D. On Thu, Dec 16, 2021 at 3:52 PM Turritopsis Dohrnii Teo En Ming < ceo.teo.en.m...@gmail.com> wrote: > Subject: How do I determine which hardware device and software has >

Re: [DISCUSS] Strong read-after-write consistency of Flink FileSystems

2021-12-14 Thread David Morávek
Any other thoughts on the topic? If there are no concerns, I'd continue with creating a FLIP for changing the "written" contract of the Flink FileSystems to reflect this. Best, D. On Wed, Dec 8, 2021 at 5:53 PM David Morávek wrote: > Hi Martijn, > > I simply wasn'

Re: Unable to create new native thread error

2021-12-13 Thread David Morávek
ockers are > dedicated only to the cluster. > The configuration of the docker are pulled from the hosts so same number > of threads is configured on the task and job managers. > > > > Kind regards, > > Ilan > > > > *From: *David Morávek > *Date: *Monday, 13 December 2021

Re: Unable to create new native thread error

2021-12-13 Thread David Morávek
stractPlainSocketImpl.java:188)\n', > > > > java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n', > > java.net.Socket.connect(Socket.java:607)\n', > > > org.apache.flink.runtime.blob.BlobClient.(BlobClient.java:96)\n', > > \t... 14 more\n',

Re: FW: EXT: Re: Issues in Batch Jobs Submission for a Session Cluster

2021-12-13 Thread David Morávek
) < jay.gh...@ge.com> wrote: > Hi @David Morávek , > > > > PFA details regarding memory config in the configmap we have set and > corresponding usage details in terms of cpu,mem and jvm when the issue > happens. > > > > Credits: @R, Aromal (GE Healthcare, consul

Re: Flink 1.13.3, k8s HA - ResourceManager was revoked leadership

2021-12-13 Thread David Morávek
Hi Alexey, please be aware that the json-based logs in the mail may not make it pass the spam filter (at least for gmail they did not) :( K8s based leader election is based on optimistic locking of the underlying config-map (~ periodically updating the lease annotation of the config-map). If JM

Re: [DISCUSS] Deprecate MapR FS

2021-12-09 Thread David Morávek
+1, agreed with Seth's reasoning. There has been no real activity in MapR FS module for years [1], so the eventual users should be good with using the jars from the older Flink versions for quite some time [1] https://github.com/apache/flink/commits/master/flink-filesystems/flink-mapr-fs Best,

Re: [DISCUSS] Strong read-after-write consistency of Flink FileSystems

2021-12-08 Thread David Morávek
y > supported and also used by Flink users [1]. There's also MapR FS, but I > doubt if that is still used. > > Best regards, > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/ > > On Mon, 6 Dec 2021 at 12:28, David Morávek wr

[DISCUSS] Strong read-after-write consistency of Flink FileSystems

2021-12-06 Thread David Morávek
Hi Everyone, as outlined in FLIP-194 discussion [1], for the future directions of Flink HA services, I'd like to verify my thoughts around guarantees of the distributed filesystems used with Flink. Currently some of the services (*JobGraphStore*, *CompletedCheckpointStore*) are implemented using

Re: GCS/Object Storage Rate Limiting

2021-12-06 Thread David Morávek
th (especially with RocksDB). >> > > Can you elaborate on this a bit? We aren't changing the parallelism when > restoring. > > On Thu, Dec 2, 2021 at 10:48 AM David Morávek wrote: > >> Hi Kevin, >> >> this happens only when the pipeline is started up fro

Re: Unable to create new native thread error

2021-12-06 Thread David Morávek
gt; > > > Another question, I see that the most critical issue (FLINK-25022) is in > progress and should be released on with version 1.13.4 , do you know when > this version is planned to be released? > > > > Thanks again, > > Ilan. > > > > *From: *David Moráv

Re: Buffering when connecting streams

2021-12-02 Thread David Morávek
framework calls the operator’s processElement1 continuously (even for > several minutes) before calling processElement2 a single time? How does the > framework decide when to switch the stream processing when the streams are > connected? > > > > Regards, > > Alexis. > &

Re: custom metrics in elasticsearch ActionRequestFailureHandler

2021-12-02 Thread David Morávek
Hi Lars, quickly looking at the ES connector code, I think you're right and there is no way to do that :( In general I'd say that being able to expose metrics is a valid request. I can imagine having some kind of `RichActionRequestFailureHandler` with `{get|set}RuntimeContext` methods. More or

Re: Watermark behavior when connecting streams

2021-12-02 Thread David Morávek
tlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#consecutive-windowed-operations > > > > Regards, > > Alexis. > > > > *From:* David Morávek > *Sent:* Donnerstag, 2. Dezember 2021 17:26 > *To:* Alexis Sarda-Espinosa > *Cc:* user@fli

Re: Watermark behavior when connecting streams

2021-12-02 Thread David Morávek
Hi Alexis, please take a look at AbstractStreamOperator [1] for details how the watermark is calculate for TwoInputOperator. It uses pretty much the same approach as for with the single input one (it simply takes a minimum). For watermark re-assignment, we ignore input watermark unless it's

Re: Buffering when connecting streams

2021-12-02 Thread David Morávek
tion I > can’t tell Flink to "halt" processElement1 and switch to the other stream > depending on watermarks. I could look into TwoInputStreamOperator if you > think that’s the best approach. > > > > Regards, > > Alexis. > > > > *From:* David Moráve

Re: Broadcast and watermark

2021-12-02 Thread David Morávek
One more thought, if you're "broadcasting" the output of the KafkaSource, it may as well be the case that some partition is empty? Best, D. On Thu, Dec 2, 2021 at 5:11 PM David Morávek wrote: > Hi Sweta, > > the output timestamp seems reasonable to me. I guess you'r

Re: Broadcast and watermark

2021-12-02 Thread David Morávek
Hi Sweta, the output timestamp seems reasonable to me. I guess you're concerned about watermarks you're seeing, is that correct? final Instant min = Instant.ofEpochMilli(Long.MIN_VALUE); final Instant max = Instant.ofEpochMilli(Long.MAX_VALUE); System.out.printf("Min: %s, Max: %s%n", min, max);

Re: Buffering when connecting streams

2021-12-02 Thread David Morávek
t been able to come up with a good way of > ensuring that all data from the side stream for a given minute is processed > by processElement2 before all data for the same (windowed) minute reaches > processElement1, even when considering watermarks. > > > > Regards, > > Alexis. &g

Re: GCS/Object Storage Rate Limiting

2021-12-02 Thread David Morávek
Hi Kevin, this happens only when the pipeline is started up from savepoint / retained checkpoint right? Guessing from the "path" you've shared it seems like a RockDB based retained checkpoint. In this case all task managers need to pull state files from the object storage in order to restore.

Re: Unable to create new native thread error

2021-12-02 Thread David Morávek
Hi Ilan, we are aware of multiple issues when web-submission can result in classloader / thread local leaks, which could potentially result in the behavior you're describing. We're working on addressing them. FLINK-25022 [1]: The most critical one leaking thread locals. FLINK-25027 [2]: Is only

Re: Buffering when connecting streams

2021-12-02 Thread David Morávek
om open > to close) triggered on second 17, and my windows are evaluated every > minute, so it wasn’t a race condition. > > > > Regards, > > Alexis. > > > > *From:* David Morávek > *Sent:* Donnerstag, 2. Dezember 2021 14:52 > *To:* Alexis Sarda-Espinosa >

Re: Buffering when connecting streams

2021-12-02 Thread David Morávek
Hi Alexis, I'm not sure what "watermark" step refers to in you graph, but in general I'd say your intuition is correct. For the "buffering" part, each sub-task needs to send data via data exchange (last operator in chain) has an output buffer and the operator that consumes this data (maybe on

Re: Question about relationship between operator instances and keys

2021-12-02 Thread David Morávek
Hi haocheng, in short it works as follows: - Each parallel instance of an operator is responsible for one to N key groups. - Each parallel instance belongs to a slot, which is tied with a single thread (slot may actually introduce multiple subtasks) - # of keygroups for each operator = max

Re: Issues in Batch Jobs Submission for a Session Cluster

2021-12-02 Thread David Morávek
Hi Jay, It's hard to say what going on here. My best guess is that you're running out of memory for your process (eg. hitting ulimit). Can you please start with checking the ulimits memory usage of your container? For the cleanup, right now it may happen in some failover scenarios that we don't

Re: FLink Accessing two hdfs cluster

2021-11-30 Thread David Morávek
Hi chenqizhu, this exception doesn't seem to come from Flink, but rather from a YARN container bootstrap. When YARN container starts up, it needs to download resources from HDFS (your job archives / configuration / distributed cache / ...) which are necessary for startup of the user application

Re: how to expose the current in-flight async i/o requests as metrics?

2021-11-09 Thread David Morávek
; >> at >> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244) >> ~[drivinghabit-stream-calculator-2.0-SNAPSHOT.jar:?] >> >> at java.lang.Thread.run(Thread.java:829) [?:?] >> > > Can it be the reason why my pipeline is s

Re: Restarting a job with drain flag set to true

2021-11-08 Thread David Morávek
ine upgrades. Or am I missing something? > > Thanks, > > Pedro > > > On Mon, 8 Nov 2021 at 15:58, David Morávek wrote: > >> Hi Pedro, >> >> draining basically means that all of the sources will finish and progress >> their watermark to end of th

Re: Restarting a job with drain flag set to true

2021-11-08 Thread David Morávek
Hi Pedro, draining basically means that all of the sources will finish and progress their watermark to end of the global window, which will fire all of the triggers as a result. In other words, it will trigger the _ON_TIME_ results from all of the unfinished windows, even though they might not

Re: Elasticsearch6 connector in flink stand alone

2021-11-08 Thread David Morávek
Hi Ravi, I'm moving this thread to the user@flink mailing list, which is designed for these type of questions. For your issue, I don't think it's related to the elasticsearch integration. It seems like there is something wrong with your log4j setup. Either you have a conflicting log4j jars on

Re: how to expose the current in-flight async i/o requests as metrics?

2021-11-08 Thread David Morávek
Hi Dongwon, There are currently no metrics for the async work-queue size (you should be able to see the queue stats with debug logs enabled though [1]). As far as I can tell from looking at the code, the async operator is able to checkpoint even if the work-queue is exhausted. Arvid can you

Re: unsubscribe

2021-11-08 Thread David Morávek
Hi Peter, to unsubscribe, please send an email to user-unsubscr...@flink.apache.org Best, D. On Fri, Nov 5, 2021 at 9:28 AM Peter Schrott wrote: > unsubscribe >

Re: [External] : Re: Possibility of supporting Reactive mode for native Kubernetes application mode

2021-11-02 Thread David Morávek
s > > [2] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#enabling-and-configuring-checkpointing > > > > Best, > > Fuyao > > > > > > > > *From: *David Morávek > *Date: *Friday,

Re: New blog post published - Sort-Based Blocking Shuffle Implementation in Flink

2021-11-02 Thread David Morávek
Thanks Daisy and Kevin for a great write up! ;) Especially the 2nd part was really interesting, I really like the idea of the single spill file with a custom scheduling of read requests. Best, D. On Mon, Nov 1, 2021 at 10:01 AM Daisy Tsang wrote: > Hey everyone, we have a new two-part post

Re: Possibility of supporting Reactive mode for native Kubernetes application mode

2021-10-30 Thread David Morávek
Hi Fuyao, this is a great question ;) 1) First let's be clear on what the reactive mode actually is. Reactive Mode is related to how the Flink makes use of the newly available resources. It greedily uses all of the resources that are available in your Flink cluster (if new task manager joins

  1   2   >