Slow restart from savepoint with large broadcast state when increasing parallelism

2022-12-14 Thread Ken Krugler
Hi all, I have a job with a large amount of broadcast state (62MB). I took a savepoint when my workflow was running with parallelism 300. I then restarted the workflow with parallelism 400. The first 297 sub-tasks restored their broadcast state fairly quickly, but after that it slowed to a

[ANNOUNCE] Apache Flink Kubernetes Operator 1.3.0 released

2022-12-14 Thread Őrhidi Mátyás
The Apache Flink community is very happy to announce the release of Apache Flink Kubernetes Operator 1.3.0. Release highlights: - Upgrade to Fabric8 6.x.x and JOSDK 4.x.x - Restart unhealthy Flink clusters - Contribute the Flink Kubernetes Operator to OperatorHub - Publish

[ANNOUNCE] Apache Flink Kubernetes Operator 1.3.0 released

2022-12-14 Thread Őrhidi Mátyás
The Apache Flink community is very happy to announce the release of Apache Flink Kubernetes Operator 1.3.0. Release highlights: - Upgrade to Fabric8 6.x.x and JOSDK 4.x.x - Restart unhealthy Flink clusters - Contribute the Flink Kubernetes Operator to OperatorHub - Publish

Re: AsyncDataStream: Retries keep executing after timeout

2022-12-14 Thread Lincoln Lee
hi, Is this case running like a it case locally, or a streaming job running on a cluster? If it's the former, one thing I can think of is local testing using bounded datasource(has few test records) that will end input very fastly and then trigger the endOfInput logic of AsyncWaitOperator, that

Re: [SURVEY] Drop Share and Key_Shared subscription support in Pulsar connector

2022-12-14 Thread 盛宇帆
Hi Zili, Thanks for picking up this discussion. Here is my answer: I agreed with your first question. If the problems are related to Pulsar, it should be redelivered to the Pulsar repo. But these flaky tests only occur on the Shared or Key_Shared subscription with the transaction and I can’t

Re: [SURVEY] Drop Share and Key_Shared subscription support in Pulsar connector

2022-12-14 Thread Zili Chen
Hi Yufan, Thanks for starting this discussion. My two coins: 1. It can help the upstream to fix the transaction issues by submitting the instability and performance issues to the pulsar repo also. 2. Could you elaborate on whether and (if so) why we should drop the Shared and Key_Share

Re: Could not restore keyed state backend for KeyedProcessOperator

2022-12-14 Thread Lars Skjærven
As far as I understand we are not specifying anything on restore mode. so I guess default (NO_CLAIM) is what we're using. We're using ververica platform to handle deploys, and things are a bit obscure on what happens underneath. It happened again this morning: Caused by:

[SURVEY] Drop Share and Key_Shared subscription support in Pulsar connector

2022-12-14 Thread 盛宇帆
Hi, I'm the maintainer of flink-connector-pulsar. I would like to start a survey on a function change proposal in flink-connector-pulsar. I have created a ticket on JIRA and paste its description here: A lot of Pulsar connector test unstable

Re: Can't use nested attributes as watermarks in Table

2022-12-14 Thread Theodor Wübker
Actually, this behaviour is documented (See the Watermarks section, where it is stated that the column must be a “top-level” column). So I suppose, there is a reason. Nevertheless it is quite a