Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-08-26 Thread Matthias Pohl
Hi Denis, I did a bit of digging: It looks like there is no way to specify them independently. You can find documentation about pod templates for TaskManager and JobManager [1]. But even there it states that for cpu and memory, the resource specs are overwritten by the Flink configuration. The

Re: Queries regarding Flink upgrade strategies

2021-08-26 Thread Matthias Pohl
Hi Amit, upgrading Flink versions means that you should stop your jobs with a savepoint first. A new cluster with the new Flink version can be deployed next. Then, this cluster can be used to start the jobs from the previously created savepoints. Each job should pick up the work from where it

Re: Bulk Scheduler timeout when creating several jobs in flink kubernetes HA deployment

2021-08-26 Thread Matthias Pohl
Hi Gil, could you provide the complete logs (TaskManager & JobManager) for us to investigate it? The error itself and the behavior you're describing sounds like expected behavior if there are not enough slots available for all the submitted jobs to be handled in time. Have you tried increasing the

Re: 1.13 Flamegraphs

2021-08-13 Thread Matthias Pohl
Hi Mason, I'm adding Alex to the thread as he might be able to help answer this question in the most precise way next week. Best, Matthias On Fri, Aug 6, 2021 at 7:43 PM Mason Chen wrote: > Hi all, > > Does the sample processing also sample threads that do not belong to the > Flink framework?

Re: ProcessFunctionTestHarnesses for testing Python functions

2021-08-13 Thread Matthias Pohl
Hi Bogdan, it does not look like it is by just doing a brief check of the code. But maybe Dian can give a more detailed answer here. I'm gonna add him to this thread. Best, Matthias On Wed, Jun 9, 2021 at 3:47 PM Bogdan Sulima wrote: > Hi all, > > in Java/Scala i was using

Re: Exception in snapshotState suppresses subsequent checkpoints

2021-07-01 Thread Matthias Pohl
Hi Tao, it looks like it should work considering that you have a sleep of 1 second before each emission. I'm going to add Roman to this thread. Maybe, he has sees something obvious which I'm missing. Could you run the job with the log level set to debug and provide the logs once more?

Re: NoSuchMethodError - getColumnIndexTruncateLength after upgrading Flink from 1.11.2 to 1.12.1

2021-06-30 Thread Matthias Pohl
ent? > > Thomas > > On Tue, Jun 29, 2021 at 1:41 AM Matthias Pohl > wrote: > >> Hi Rommel, Hi Thomas, >> Apache Parquet was bumped from 1.10.0 to 1.11.1 for Flink 1.12 in >> FLINK-19137 [1]. The error you're seeing looks like some dependency issue >&g

Re: Recommended way to submit a SQL job via code without getting tied to a Flink version?

2021-06-29 Thread Matthias Pohl
Hi Sonam, what's the reason for not using the Flink SQL client? Because of the version issue? I only know that FlinkSQL's state is not backwards-compatible between major Flink versions [1]. But that seems to be unrelated to what you describe. I'm gonna add Jark and Timo to this thread. Maybe,

Re: Monitoring Exceptions using Bugsnag

2021-06-29 Thread Matthias Pohl
Hi Kevin, I haven't worked with Bugsnag. So, I cannot give more input on that one. For Flink, exceptions are handled by the job's scheduler. Flink collects these exceptions in some bounded queue called the exception history [1]. It collects task failures but also global failures which make the job

Re: NoSuchMethodError - getColumnIndexTruncateLength after upgrading Flink from 1.11.2 to 1.12.1

2021-06-29 Thread Matthias Pohl
Hi Rommel, Hi Thomas, Apache Parquet was bumped from 1.10.0 to 1.11.1 for Flink 1.12 in FLINK-19137 [1]. The error you're seeing looks like some dependency issue where you have a version other than 1.11.1 of org.apache.parquet:parquet-column:jar on your classpath? Matthias [1]

Re: Triggering Savepoint fails to write data to S3 store

2021-05-28 Thread Matthias Pohl
On Fri, May 28, 2021 at 8:21 AM Matthias Pohl > wrote: > >> Hi Robert, >> it would be interesting to see the corresponding taskmanager/jobmanager >> logs. That would help in finding out why the savepoint creation failed. >> Just to verify: The savepoint data wasn't wri

Re: How can I use different user run flink

2021-05-28 Thread Matthias Pohl
Hi igyu, I'm not sure whether I can be of much help here because I'm not that familiar with Kerberos. But the Flink documentation [1] suggests deploying separate Flink clusters for each keytab. Did you try that? Best, Matthias [1]

Re: JM cannot recover with Kubernetes HA

2021-05-28 Thread Matthias Pohl
Hi Enrique, thanks for reaching out to the community. I'm not 100% sure what problem you're facing. The log messages you're sharing could mean that the Flink cluster still behaves as normal having some outages and the HA functionality kicking in. The behavior you're seeing with leaders for the

Re: Query related to Minimum scrape interval for Prometheus and fetching metrics of all vertices in a job through Flink Rest API

2021-05-28 Thread Matthias Pohl
Hi Ashutosh, you can set the metrics update interval through metrics.fetcher.update-interval [1]. Unfortunately, there is no single endpoint to collect all the metrics in a more efficient way other than the metrics endpoints provided in [2]. I hope that helps. Best, Matthias [1]

Re: Alternate way to pass plugin Jars

2021-05-28 Thread Matthias Pohl
Hi Vijayendra, Thanks for reaching out to the Flink community. There is no other way I know of to achieve what you're looking for. The plugins support is provided through the respective ./plugins/ directory as described in the docs [1]. Best, Matthias [1]

Re: Heartbeat Timeout

2021-05-28 Thread Matthias Pohl
Hi Robert, increasing heap memory usage could be due to some memory leak in the user code. Have you analyzed a heap dump? About the TM logs you shared. I don't see anything suspicious there. Nothing about memory problems. Are those the correct logs? Best, Matthias On Thu, May 27, 2021 at 6:01 PM

Re: Triggering Savepoint fails to write data to S3 store

2021-05-28 Thread Matthias Pohl
Hi Robert, it would be interesting to see the corresponding taskmanager/jobmanager logs. That would help in finding out why the savepoint creation failed. Just to verify: The savepoint data wasn't written to S3 even after the timeout happened, was it? Best, Matthias On Thu, May 27, 2021 at 7:50

Re: Regarding FLIP-91's status

2021-05-28 Thread Matthias Pohl
Hi Sonam, It looks like it has been stale for some time. You might be able to restart the discussion replying to the respective thread in the dev mailing list [1]. You seem to be right about the repository based on Jark's reply in the related ticket FLINK-15472 [2]. I'm adding Jark to the thread.

Re: KafkaSource

2021-05-28 Thread Matthias Pohl
Hi Alexey, the two implementations are not compatible. You can find information on how to work around this in the Kafka Connector docs [1]. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#upgrading-to-the-latest-connector-version

Re: Flink upgraded from 1.10.0 to 1.12.0

2021-05-28 Thread Matthias Pohl
Hi 王炳焱, thanks for reaching out to the Flink community and sorry for the late reply. Unfortunately, Flink SQL does not support state backwards compatibility. There is no clear pointer in the documentation that states that. I created FLINK-22799 [1] to cover this. In the mean time, you could try

Re: [VOTE] Release 1.13.1, release candidate #1

2021-05-26 Thread Matthias Pohl
Hi Dawid, +1 (non-binding) Thanks for driving this release. I checked the following things: - downloaded and build source code - verified checksums - double-checked diff of pom files between 1.13.0 and 1.13.1-rc1 - did a visual check of the release blog post - started cluster and ran jobs

Re: yarn ship from s3

2021-05-26 Thread Matthias Pohl
Hi Vijay, have you tried yarn-ship-files [1] or yarn-ship-archives [2]? Maybe, that's what you're looking for... Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/#yarn-ship-files [2]

Re: Root Exception can not be shown on Web UI in Flink 1.13.0

2021-05-19 Thread Matthias Pohl
] Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-22688 On Wed, May 19, 2021 at 5:45 AM Gary Wu wrote: > Thanks! I have updated the detail and task manager log in > https://issues.apache.org/jira/browse/FLINK-22688. > > Regards, > -Gary > > On Tue, 18 May 2021 at

Re: Root Exception can not be shown on Web UI in Flink 1.13.0

2021-05-18 Thread Matthias Pohl
Sorry, for not getting back earlier. I missed that thread. It looks like some wrong assumption on our end. Hence, Yangze and Guowei are right. I'm gonna look into the issue. Matthias On Fri, May 14, 2021 at 4:21 AM Guowei Ma wrote: > Hi, Gary > > I think it might be a bug. So would you like to

Re: MemoryStateBackend Issue

2021-05-17 Thread Matthias Pohl
AM Milind Vaidya wrote: > Sounds good. > How do I achieve stop with savepoint ? > > - Milind > > On Mon, Apr 26, 2021 at 12:55 AM Matthias Pohl > wrote: > >> I'm not sure what you're trying to achieve. Are you trying to simulate a >> task failure? Or are

Re: [ANNOUNCE] Apache Flink 1.13.0 released

2021-05-04 Thread Matthias Pohl
Yes, thanks for managing the release, Dawid & Guowei! +1 On Tue, May 4, 2021 at 4:20 PM Etienne Chauchot wrote: > Congrats to everyone involved ! > > Best > > Etienne > On 03/05/2021 15:38, Dawid Wysakowicz wrote: > > The Apache Flink community is very happy to announce the release of Apache >

Re: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal()Ljava/lang/Object; from class org.apache.hadoop.yarn.client.R

2021-05-04 Thread Matthias Pohl
t; sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:288) > > On Tue, May 4, 2021 at 11:47 AM Rag

Re: Interactive Programming in Flink (Cache operation)

2021-05-04 Thread Matthias Pohl
.nabble.com/RESULT-VOTE-FLIP-36-Support-Interactive-Programming-in-Flink-Table-API-td45024.html On Tue, May 4, 2021 at 8:12 AM Jack Kolokasis wrote: > Hi Matthias, > > Thank you for your reply. Are you going to include it in future versions? > > Best, > Iacovos > On 4/5/21 9:1

Re: Interactive Programming in Flink (Cache operation)

2021-05-04 Thread Matthias Pohl
Hi Iacovos, unfortunately, it doesn't as the related FLINK-19343 [1] is not resolved, yet. The release notes for Flink 1.13 can be found in [2]. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-19343 [2] https://flink.apache.org/news/2021/05/03/release-1.13.0.html On Mon, May 3,

Re: savepoint command in code

2021-05-04 Thread Matthias Pohl
Hi Abdullah, is there a reason you're not considering triggering the stop-with-savepoint operation through the REST API [1]? I'm not entirely sure whether I understand you correctly: ./bin/flink is an executable. Why Would you assume it to be shown as a directory? You would need to provide

Re: StopWithSavepoint() method doesn't work in Java based flink native k8s operator

2021-05-03 Thread Matthias Pohl
Hi Fuyao, sorry for not replying earlier. The stop-with-savepoint operation shouldn't only suspend but terminate the job. Is it that you might have a larger state that makes creating the savepoint take longer? Even though, considering that you don't experience this behavior with your 2nd solution,

Re: Zookeeper or Kubernetes for HA?

2021-05-03 Thread Matthias Pohl
Hi Vishal, Do I understand you correctly that you're wondering whether you should stick to ZooKeeper HA on Kubernetes vs Kubernetes HA? You could argue that ZooKeeper might be better since it's already supported for longer and, therefore, better tested. The Kubernetes HA implementation left the

Re: Flink(1.12.2/scala 2.11) HA with Zk in kubernetes standalone mode deployment is not working

2021-05-03 Thread Matthias Pohl
Hi Bhagi, Thanks for reaching out to the Flink community. The error the UI is showing is normal during an ongoing leader election. Additionally, the connection refused warnings seem to be normal according to other mailing list threads. Are you referring to the UI error as the issue you are facing?

Re: Protobuf support with Flink SQL and Kafka Connector

2021-05-03 Thread Matthias Pohl
Hi Shipeng, it looks like there is an open Jira issue FLINK-18202 [1] addressing this topic. You might want to follow up on that one. I'm adding Timo and Jark to this thread. They might have more insights. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-18202 On Sat, May 1, 2021

Re: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal()Ljava/lang/Object; from class org.apache.hadoop.yarn.client.R

2021-05-03 Thread Matthias Pohl
Hi Ragini, this is a dependency version issue. Flink 1.8.x does not support Hadoop 3, yet. The support for Apache Hadoop 3.x was added in Flink 1.11 [1] through FLINK-11086 [2]. You would need to upgrade to a more recent Flink version. Best, Matthias [1]

Re: Setup of Scala/Flink project using Bazel

2021-05-03 Thread Matthias Pohl
Hi Salva, unfortunately, I have no experience with Bazel. Just by looking at the code you shared I cannot come up with an answer either. Have you checked out the ML thread in [1]? It provides two other examples where users used Bazel in the context of Flink. This might give you some hints on where

Re: Flink Version 1.11 job savepoint failures

2021-05-03 Thread Matthias Pohl
Hi Rainie, the savepoint creation failed due to some tasks already being finished. It looks like you ran into an issue that was (partially as FLINK-21066 [1] is only a subtask of a bigger issue?) addressed in Flink 1.13 (see FLINK-21066). I'm pulling Yun Gao into this thread. Let's see whether Yun

Re: MemoryStateBackend Issue

2021-04-26 Thread Matthias Pohl
gt; > > > On Fri, Apr 23, 2021 at 12:32 AM Matthias Pohl > wrote: > >> One additional question: How did you stop and restart the job? The >> behavior you're expecting should work with stop-with-savepoint. Cancelling >> the job and then just restarting it

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-23 Thread Matthias Pohl
-22425 On Fri, Apr 23, 2021 at 8:16 AM Matthias Pohl wrote: > To me, it sounds strange. I would have expected it to work with > `allowedLateness` and `sideOutput` being defined. I pull in David to have a > look at it. Maybe, he has some more insights. I haven't worked that much > with l

Re: Debezium CDC | OOM

2021-04-23 Thread Matthias Pohl
> Please refer to this - https://github.com/apache/iceberg/issues/2504 > > > > On Thu, Apr 22, 2021 at 8:44 PM Matthias Pohl > wrote: > >> Hi Ayush, >> Which state backend have you configured [1]? Have you considered trying >> out RocksDB [2]? RocksDB might

Re: MemoryStateBackend Issue

2021-04-23 Thread Matthias Pohl
PM Matthias Pohl wrote: > Hi Milind, > I bet someone else might have a faster answer. But could you provide the > logs and config to get a better understanding of what your issue is? > In general, the state is maintained even in cases where a TaskManager > fails. > > Best,

Re: Flink streaming job's taskmanager process killed by yarn nodemanager because of exceeding 'PHYSICAL' memory limit

2021-04-23 Thread Matthias Pohl
Another few questions: Have you had the chance to monitor/profile the memory usage? What section of the memory was used excessively? Additionally, could @dhanesh arole 's proposal solve your issue? Matthias On Fri, Apr 23, 2021 at 8:41 AM Matthias Pohl wrote: > Thanks for sharing these deta

Re: Flink streaming job's taskmanager process killed by yarn nodemanager because of exceeding 'PHYSICAL' memory limit

2021-04-23 Thread Matthias Pohl
Thanks for sharing these details. Looking into FLINK-14952 [1] (which introduced this option) and the related mailing list thread [2], it feels like your issue is quite similar to what is described in there even though it sounds like this issue is mostly tied to bounded jobs. But I'm not sure what

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-23 Thread Matthias Pohl
sElement(KeyedTimedValue value, Context ctx >> , >> Collector> out) throws Exception { >> if (ctx.timestamp() + allowedLateness <= ctx.timerService(). >> currentWatermark()) { >> out.collect(value); >> } >> } >> } >> >> &

Re: Question about snapshot file

2021-04-23 Thread Matthias Pohl
hank you > > On Thu, Apr 22, 2021 at 10:19 AM Matthias Pohl > wrote: > >> Hi Abdullah, >> the metadata file contains handles to the operator states of the >> checkpoint [1]. You might want to have a look into the State Processor API >> [2]. >

Re: [ANNOUNCE] Flink Jira Bot fully live (& Useful Filters to Work with the Bot)

2021-04-22 Thread Matthias Pohl
Thanks for setting this up, Konstantin. +1 On Thu, Apr 22, 2021 at 11:16 AM Konstantin Knauf wrote: > Hi everyone, > > all of the Jira Bot rules are live now. Particularly in the beginning the > Jira Bot will be very active, because the rules apply to a lot of old, > stale tickets. So, if you

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Matthias Pohl
filter. Can you >> please validate. That aside I already have the lateness configured for the >> session window ( the normal withLateNess() ) and this looks like a session >> window was not collected and still is alive for some reason ( a flink bug ? >> ) >> >&g

Re: Question about snapshot file

2021-04-22 Thread Matthias Pohl
Hi Abdullah, the metadata file contains handles to the operator states of the checkpoint [1]. You might want to have a look into the State Processor API [2]. Best, Matthias [1]

Re: Debezium CDC | OOM

2021-04-22 Thread Matthias Pohl
Hi Ayush, Which state backend have you configured [1]? Have you considered trying out RocksDB [2]? RocksDB might help with persisting at least keyed state. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#choose-the-right-state-backend [2]

Re: Kubernetes Setup - JM as job vs JM as deployment

2021-04-22 Thread Matthias Pohl
Hi Gil, I'm not sure whether I understand you correctly. What do you mean by deploying the job manager as "job" or "deployment"? Are you referring to the different deployment modes, Flink offers [1]? These would be independent of Kubernetes. Or do you wonder what the differences are between the

Re: Long to Timestamp(3) Conversion

2021-04-22 Thread Matthias Pohl
Hi Aeden, there are some improvements to time conversions coming up in Flink 1.13. For now, the best solution to overcome this is to provide a user-defined function. Hope, that helps. Best, Matthias On Wed, Apr 21, 2021 at 9:36 PM Aeden Jameson wrote: > I've probably overlooked something

Re: MemoryStateBackend Issue

2021-04-22 Thread Matthias Pohl
Hi Milind, I bet someone else might have a faster answer. But could you provide the logs and config to get a better understanding of what your issue is? In general, the state is maintained even in cases where a TaskManager fails. Best, Matthias On Thu, Apr 22, 2021 at 5:11 AM Milind Vaidya

Re: Flink Hadoop config on docker-compose

2021-04-22 Thread Matthias Pohl
I think you're right, Flavio. I created FLINK-22414 to cover this. Thanks for bringing it up. Matthias [1] https://issues.apache.org/jira/browse/FLINK-22414 On Fri, Apr 16, 2021 at 9:32 AM Flavio Pompermaier wrote: > Hi Yang, > isn't this something to fix? If I look at the documentation at

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Matthias Pohl
Hi Vishal, based on the error message and the behavior you described, introducing a filter for late events is the way to go - just as described in the SO thread you mentioned. Usually, you would collect late events in some kind of side output [1]. I hope that helps. Matthias [1]

Re: Flink streaming job's taskmanager process killed by yarn nodemanager because of exceeding 'PHYSICAL' memory limit

2021-04-22 Thread Matthias Pohl
Hi, I have a few questions about your case: * What is the option you're referring to for the bounded shuffle? That might help to understand what streaming mode solution you're looking for. * What does the job graph look like? Are you assuming that it's due to a shuffling operation? Could you

Re: [BULK]Re: [SURVEY] Remove Mesos support

2021-04-14 Thread Matthias Pohl
s, > > >> Till > > >> > > >> On Thu, Mar 25, 2021 at 1:49 PM Konstantin Knauf > > wrote: > > >>> > > >>> Hi Matthias, > > >>> > > >>> Thank you for following up on this. +1 to officially deprecate Mesos > >

Re: Restoring from Flink Savepoint in Kubernetes not working

2021-04-01 Thread Matthias Pohl
he UI, I see the following: > > *Latest Failed Checkpoint ID: 50Failure Time: 2021-03-31 09:34:43Cause: > Asynchronous task checkpoint failed.* > > What does this failure mean? > > > On Wed, Mar 31, 2021 at 9:22 AM Matthias Pohl > wrote: > >> Hi Claude, >>

Re: JDBC connector support for JSON

2021-04-01 Thread Matthias Pohl
Hi Fanbin, I'm not that familiar with the FlinkSQL features. But it looks like the JdbcConnector does not support Json as stated in the documentation [1]. You might work around it by implementing your own user-defined functions [2]. I hope this helps. Matthias [1]

Re: IO benchmarking

2021-04-01 Thread Matthias Pohl
nto one (or > more?) files on persistent storage. I'll check out the code pointers! > > On Wed, Mar 31, 2021 at 7:07 AM Matthias Pohl > wrote: > >> Hi Deepthi, >> 1. Have you had a look at flink-benchmarks [1]? I haven't used it but it >> might be helpful. >

Re: IO benchmarking

2021-03-31 Thread Matthias Pohl
Hi Deepthi, 1. Have you had a look at flink-benchmarks [1]? I haven't used it but it might be helpful. 2. Unfortunately, Flink doesn't provide metrics like that. But you might want to follow FLINK-21736 [2] for future developments. 3. Is there anything specific you are looking for? Unfortunately,

Re: Restoring from Flink Savepoint in Kubernetes not working

2021-03-31 Thread Matthias Pohl
Hi Claude, thanks for reaching out to the Flink community. Could you provide the Flink logs for this run to get a better understanding of what's going on? Additionally, what exact Flink 1.12 version are you using? Did you also verify that the snapshot was created by checking the actual folder?

Re: Proper way to get DataStream

2021-03-31 Thread Matthias Pohl
Hi Maminspapin again, have you checked whether your topic actually contains data that matches your schema specified through cep.model.User? Best, Matthias On Tue, Mar 30, 2021 at 3:39 PM Maminspapin wrote: > Hi, > > I'm trying to solve a task with getting data from topic. This topic keeps >

Re: DataStream from kafka topic

2021-03-31 Thread Matthias Pohl
Ok, it looks like you've found that solution already based on your question in [1]. [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Proper-way-to-get-DataStream-lt-GenericRecord-gt-td42640.html On Wed, Mar 31, 2021 at 1:26 PM Matthias Pohl wrote: > Hi Maminspapin,

Re: DataStream from kafka topic

2021-03-31 Thread Matthias Pohl
Hi Maminspapin, I haven't worked with Kafka/Flink, yet. But have you had a look at the docs about the DeserializationSchema [1]? It mentions ConfluentRegistryAvroDeserializationSchema. Is this something you're looking for? Best, Matthias [1]

Re: Fw:A question about flink watermark illustration in official documents

2021-03-31 Thread Matthias Pohl
Hi 罗昊, the 2nd picture is meant to visualize the issue of out-of-orderness in general. I'd say it's not referring to a specific strategy. But one way to interpret the image is using the BoundedOutOfOrderness strategy for watermark generation [1]: You can define an upper bound B for the

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-26 Thread Matthias Pohl
-watermak-interval and setAutoWatermarkInterval are >> effectively the same setting/option. However I am not sure if Table API >> interprets it in the same way as DataStream APi. The documentation you >> linked, Aeden, describes the SQL API. >> >> @Jark @Timo Could you verify

Re: [BULK]Re: [SURVEY] Remove Mesos support

2021-03-25 Thread Matthias Pohl
Hi everyone, considering the upcoming release of Flink 1.13, I wanted to revive the discussion about the Mesos support ones more. Mesos is also already listed as deprecated in Flink's overall roadmap [1]. Maybe, it's time to align the documentation accordingly to make it more explicit? What do

Re: DataDog and Flink

2021-03-24 Thread Matthias Pohl
Hi Vishal, what about the TM metrics' REST endpoint [1]. Is this something you could use to get all the metrics for a specific TaskManager? Or are you looking for something else? Best, Matthias [1]

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-24 Thread Matthias Pohl
1. yes - the same key would affect the same state variable 2. you need a join to have the same operator process both streams Matthias On Wed, Mar 24, 2021 at 7:29 AM vishalovercome wrote: > Let me make the example more concrete. Say O1 gets as input a data stream > T1 > which it splits into

Re: Flink Streaming Counter

2021-03-24 Thread Matthias Pohl
o quickly. I am looking for a sample > example where I can increment counters on each stage #1 thru #3 for > DATASTREAM. > Then probably I can print it using slf4j. > > Thanks, > Vijay > > On Tue, Mar 23, 2021 at 6:35 AM Matthias Pohl > wrote: > >> Hi Vijaye

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-23 Thread Matthias Pohl
Hi Vishal, I'm not 100% sure what you're trying to do. But the partitioning by a key just relies on the key on the used parallelism. So, I guess, what you propose should work. You would have to rely on some join function, though, when merging two input operators into one again. I hope that was

Re: How to get operator uid from a sql

2021-03-23 Thread Matthias Pohl
Hi XU Qinghui, sorry for the late reply. Unfortunately, the operator ID does not mean to be accessible for Flink SQL through the API. You might have a chance to extract the Operator ID through the debug logs. StreamGraphHasherV2.generateDeterministicHash logs out the operator ID [1]: "[main] DEBUG

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-23 Thread Matthias Pohl
Hi Aeden, sorry for the late reply. I looked through the code and verified that the JavaDoc is correct. Setting pipeline.auto-watermark-interval to 0 will disable the automatic watermark generation. I created FLINK-21931 [1] to cover this. Thanks, Matthias [1]

Re: QueryableStateClient getKVState

2021-03-23 Thread Matthias Pohl
name. > > I can use String, Int, Enum, Long type keys in the Key that I send in the > Query getKvState … but the moment I introduce a TreeMap, even though it > contains a simple one entry String, String, it doesn’t work … > > Thanks, > Sandeep > > On 23-Mar-2021, at 7:00 P

Re: Flink Streaming Counter

2021-03-23 Thread Matthias Pohl
Hi Vijayendra, thanks for reaching out to the Flink community. What do you mean by displaying it in your local IDE? Would it be ok to log the information out onto stdout? You might want to have a look at the docs about setting up a slf4j metrics report [1] if that's the case. Best, Matthias [1]

Re: QueryableStateClient getKVState

2021-03-23 Thread Matthias Pohl
Hi Sandeep, the equals method does not compare the this.map with that.map but that.dimensions. ...at least in your commented out code. Might this be the problem? Best, Matthias On Tue, Mar 23, 2021 at 5:28 AM Sandeep khanzode wrote: > Hi, > > I have a stream that exposes the state for

Re: Evenly distribute task slots across task-manager

2021-03-23 Thread Matthias Pohl
.nabble.com/Evenly-Spreading-Out-Source-Tasks-tp42108p42235.html On Tue, Mar 23, 2021 at 1:42 PM Matthias Pohl wrote: > Hi Vignesh, > are you trying to achieve an even distribution of tasks for this one > operator that has the parallelism set to 16? Or do you observe the > described b

Re: Evenly distribute task slots across task-manager

2021-03-23 Thread Matthias Pohl
Hi Vignesh, are you trying to achieve an even distribution of tasks for this one operator that has the parallelism set to 16? Or do you observe the described behavior also on a job level? I'm adding Chesnay to the thread as he might have more insights on this topic. Best, Matthias On Mon, Mar

Re: RocksDB StateBuilder unexpected exception

2021-03-23 Thread Matthias Pohl
Hi Danesh, thanks for reaching out to the Flink community. Checking the code, it looks like the OutputStream is added to a CloseableRegistry before writing to it [1]. My suspicion is - based on the exception cause - that the CloseableRegistry got triggered while restoring the state. I tried to

Re: Read the metadata files (got from savepoints)

2021-03-22 Thread Matthias Pohl
Hi Abdullah, you might also want to have a look at the State Processor API [1]. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html On Mon, Mar 22, 2021 at 6:28 AM Congxian Qiu wrote: > Hi >Maybe you can reach to this test[1] for

Re: Flink History server ( running jobs )

2021-03-19 Thread Matthias Pohl
Hi Vishal, yes, as the documentation explains [1]: Only jobs that reached a globally terminal state are archived into Flink's history server. State information about running jobs can be retrieved through Flink's REST API. Best, Matthias [1]

Re: Evenly Spreading Out Source Tasks

2021-03-12 Thread Matthias Pohl
Hi Aeden, just to be sure: All task managers have the same hardware/memory configuration, haven't they? I'm not 100% sure whether this affects the slot selection in the end, but it looks like this parameter has also an influence on the slot matching strategy preferring slots with less utilization

Re: apache-flink +spring boot - logs related to spring boot application start up not printed in file in flink 1.12

2021-03-10 Thread Matthias Pohl
Hi Abhishek, sorry for the late reply. Did you manage to fix it? One remark: Are you sure you're referring to the right configuration file? log4j-cli.properties is used for the CLI tool [1]. Or do you try to get the logs from within the main of your job? Best, Matthias [1]

Re: Flink application kept restarting

2021-03-03 Thread Matthias Pohl
stractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.

Re: Flink application kept restarting

2021-03-01 Thread Matthias Pohl
Another question is: The timeout of 48 hours sounds strange. There should have been some other system noticing the connection problem earlier assuming that you have a reasonably low heartbeat interval configured. Matthias On Mon, Mar 1, 2021 at 1:22 PM Matthias Pohl wrote: > Tha

Re: Flink application kept restarting

2021-03-01 Thread Matthias Pohl
On Fri, Feb 26, 2021 at 6:33 AM Matthias Pohl > wrote: > >> Hi Rainie, >> the network buffer pool was destroyed for some reason. This happens when >> the NettyShuffleEnvironment gets closed which is triggered when an operator >> is cleaned up, for instance. Maybe, the t

Re: apache-flink +spring boot - logs related to spring boot application start up not printed in file in flink 1.12

2021-02-28 Thread Matthias Pohl
Hi Abhishek, have you also tried to apply the instructions listed in [1]? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/advanced/logging.html#configuring-log4j1 On Mon, Mar 1, 2021 at 4:42 AM Abhishek Shukla wrote: > Hi Matthias, > Thanks for

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-26 Thread Matthias Pohl
he Flink jar and configuration file."* > > Same mentioned here > <https://docs.cloudera.com/csa/1.2.0/installation/topics/csa-hdfs-home-install.html> > . > > > <https://wints.github.io/flink-web//faq.html#the-yarn-session-crashes-with-a-hdfs-permission-exception-duri

Re: Is it possible to specify max process memory in flink 1.8.2, similar to what is possible in flink 1.11

2021-02-26 Thread Matthias Pohl
Hi Bariša, have you had the chance to analyze the memory usage in more detail? An OutOfMemoryError might be an indication for some memory leak which should be solved instead of lowering some memory configuration parameters. Or is it that the off-heap memory is not actually used but blocks the JVM

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-26 Thread Matthias Pohl
Hi Debraj, thanks for reaching out to the Flink community. Without knowing the details on how you've set up the Single-Node YARN cluster, I would still guess that it is a configuration issue on the YARN side. Flink does not know about a .flink folder. Hence, there is no configuration to set this

Re: Processing-time temporal join is not supported yet.

2021-02-26 Thread Matthias Pohl
Hi Eric, it looks like you ran into FLINK-19830 [1]. I'm gonna add Leonard to the thread. Maybe, he has a workaround for your case. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-19830 On Fri, Feb 26, 2021 at 11:40 AM eric hoffmann wrote: > Hello > Working with flink 1.12.1 i

Re: apache-flink +spring boot - logs related to spring boot application start up not printed in file in flink 1.12

2021-02-26 Thread Matthias Pohl
Hi Abhishek, this might be caused by the switch from log4j to log4j2 as the default in Flink 1.11 [1]. Have you had a chance to look at the logging documentation [2] to enable log4j again? Best, Matthias [1]

Re: Get JobId and JobManager RPC Address in RichMapFunction executed in TaskManager

2021-02-26 Thread Matthias Pohl
Hi Sandeep, thanks for reaching out to the community. Unfortunately, the information you're looking for is not exposed in a way that you could access it from within your RichMapFunction. Could you elaborate a bit more on what you're trying to achieve? Maybe, we can find another solution for your

Re: Flink application kept restarting

2021-02-26 Thread Matthias Pohl
Hi Rainie, the network buffer pool was destroyed for some reason. This happens when the NettyShuffleEnvironment gets closed which is triggered when an operator is cleaned up, for instance. Maybe, the timeout in the metric system caused this. But I'm not sure how this is connected. I'm gonna add

Re: Question

2021-02-22 Thread Matthias Pohl
stable/dev/project-configuration.html#maven-quickstart On Mon, Feb 22, 2021 at 5:06 PM Abu Bakar Siddiqur Rahman Rocky < bakar121...@gmail.com> wrote: > Hi Matthias Pohl, > > Thank you for your reply. > > At first, I'm sorry if my question make you confuse. Let me know i

Re: Question

2021-02-22 Thread Matthias Pohl
-docs-release-1.12/try-flink/local_installation.html > > I can run the code in the UI of Apache Flink that is in the bin file of > Apache Flink. If I run a java code from intellij idea or eclipse, then how > can I connect the code to apache flink UI? > > Thank you! > > On F

Re: Flink’s Kubernetes HA services - NOT working

2021-02-15 Thread Matthias Pohl
I'm adding the Flink user ML to the conversation again. On Mon, Feb 15, 2021 at 8:18 AM Matthias Pohl wrote: > Hi Omer, > thanks for sharing the configuration. You're right: Using NFS for HA's > storageDir is fine. > > About the error message you're referring to: I haven't wor

Re: Question

2021-02-12 Thread Matthias Pohl
is not, could you please provide me the source code to access in the > storage where snapshots are saved? > > Thank you > > > > > On Fri, Feb 12, 2021 at 2:44 AM Matthias Pohl > wrote: > >> Hi Abu Bakar Siddiqur Rahman, >> Have you had a look at the Flink d

Re: Question

2021-02-12 Thread Matthias Pohl
Hi Abu Bakar Siddiqur Rahman, Have you had a look at the Flink documentation [1]? It provides step-by-step instructions on how to run a job (the Flink binaries provide example jobs under ./examples) using a local standalone cluster. This should also work on a Mac. You would just need to start the

Re: threading and distribution

2021-02-11 Thread Matthias Pohl
Hi Marco, sorry for the late reply. The documentation you found [1] is already a good start. You can define how many subtasks of an operator run in parallel using the operator's parallelism configuration [2]. Each operator's subtask will run in a separate task slot. There's the concept of slot

<    1   2   3   >