[jira] [Commented] (FLINK-31820) Support data source sub-database and sub-table

2023-05-04 Thread xingyuan cheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719641#comment-17719641
 ] 

xingyuan cheng commented on FLINK-31820:


[~martijnvisser] Hello, sorry for the late reply after the May Day holiday. I 
did some simple research. At present, in the case of a large number of single 
databases, some domestic companies use sub-database and sub-table solutions, 
and foreign companies use distributed databases. Solution. For the difference 
between domestic and foreign use, the reason for using sub-database and 
sub-table is because the storage medium of the stock data has been determined, 
and the cost of migrating to the distributed database is too high, which is 
unacceptable for the enterprise business, so we need to transform the connector 
Make it support sub-database sub-table.

 

The reference given by [~Thesharing]  is a powerful explanation of mysql 
sub-database and sub-table. I will update the documentation in the near future.

> Support data source sub-database and sub-table
> --
>
> Key: FLINK-31820
> URL: https://issues.apache.org/jira/browse/FLINK-31820
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC
>Reporter: xingyuan cheng
>Priority: Major
>  Labels: pull-request-available
>
> At present, apache/flink-connector-jdbc does not support sub-database and 
> table sub-database. Now three commonly used databases Mysql, Postgres and 
> Oracle support sub-database and sub-table
>  
> Taking oracle as an example, users only need to configure the following 
> format to use
>  
> {code:java}
> create table oracle_source (
> EMPLOYEE_ID BIGINT,
> START_DATE TIMESTAMP,
> END_DATE TIMESTAMP,
> JOB_ID VARCHAR,
> DEPARTMENT_ID VARCHAR
> ) with (
> type = 'oracle',    
> url = 
> 'jdbc:oracle:thin:@//localhost:3306/order_([0-9]{1,}),jdbc:oracle:thin:@//localhost:3306/order_([0-9]{1,})',
>    userName = 'userName',
> password = 'password',
> dbName = 'hr', 
> table-name = 'order_([0-9]{1,})',
> timeField = 'START_DATE',
> startTime = '2007-1-1 00:00:00'
> ); {code}
> In the above code, the dbName attribute corresponds to the schema-name 
> attribute in oracle or postgres, and the mysql database needs to manually 
> specify the dbName
>  
> At the same time, I am also developing the CDAS whole database 
> synchronization syntax for the company, and the data source supports 
> sub-database and table as part of it. Add unit tests. For now, please keep 
> this PR in draft status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #584: [FLINK-31867] Enforce a minimum number of observations within a metric window

2023-05-04 Thread via GitHub


gyfora commented on code in PR #584:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/584#discussion_r1185716270


##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##
@@ -52,6 +52,13 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
 .defaultValue(Duration.ofMinutes(5))
 .withDescription("Scaling metrics aggregation window 
size.");
 
+public static final ConfigOption METRICS_WINDOW_MIN_OBSERVATIONS =
+autoScalerConfig("metrics.window.min-observations")
+.intType()
+.defaultValue(10)

Review Comment:
   I think the default value here is inconsistent with the default 5 minute 
metric window. I suggest setting the default here to 5 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31820) Support data source sub-database and sub-table

2023-05-04 Thread xingyuan cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xingyuan cheng updated FLINK-31820:
---
Description: 
At present, apache/flink-connector-jdbc does not support sub-database and table 
sub-database. Now three commonly used databases Mysql, Postgres and Oracle 
support sub-database and sub-table

 

Taking oracle as an example, users only need to configure the following format 
to use

 
{code:java}
create table oracle_source (
EMPLOYEE_ID BIGINT,
START_DATE TIMESTAMP,
END_DATE TIMESTAMP,
JOB_ID VARCHAR,
DEPARTMENT_ID VARCHAR
) with (
type = 'oracle',    
url = 
'jdbc:oracle:thin:@//localhost:3306/order_([0-9]{1,}),jdbc:oracle:thin:@//localhost:3306/order_([0-9]{1,})',
   userName = 'userName',
password = 'password',
dbName = 'hr', 
table-name = 'order_([0-9]{1,})',
timeField = 'START_DATE',
startTime = '2007-1-1 00:00:00'
); {code}
In the above code, the dbName attribute corresponds to the schema-name 
attribute in oracle or postgres, and the mysql database needs to manually 
specify the dbName

 

At the same time, I am also developing the CDAS whole database synchronization 
syntax for the company, and the data source supports sub-database and table as 
part of it. Add unit tests. For now, please keep this PR in draft status.

  was:
At present, apache/flink-connector-jdbc does not support sub-database and table 
sub-database. Now three commonly used databases Mysql, Postgres and Oracle 
support sub-database and sub-table

 

Taking oracle as an example, users only need to configure the following format 
to use

 
{code:java}
create table oracle_source (
EMPLOYEE_ID BIGINT,
START_DATE TIMESTAMP,
END_DATE TIMESTAMP,
JOB_ID VARCHAR,
DEPARTMENT_ID VARCHAR
) with (
type = 'oracle',    
url = 
'jdbc:oracle:thin:@//localhost:3306/order_([0-9]{1,}),jdbc:oracle:thin:@//localhost:3306/order_([0-9]{1,})',
   userName = 'userName',
password = 'password',
dbName = 'hr',
tableName = 'job_history',
timeField = 'START_DATE',
startTime = '2007-1-1 00:00:00'
); {code}
In the above code, the dbName attribute corresponds to the schema-name 
attribute in oracle or postgres, and the mysql database needs to manually 
specify the dbName

 

At the same time, I am also developing the CDAS whole database synchronization 
syntax for the company, and the data source supports sub-database and table as 
part of it. Add unit tests. For now, please keep this PR in draft status.


> Support data source sub-database and sub-table
> --
>
> Key: FLINK-31820
> URL: https://issues.apache.org/jira/browse/FLINK-31820
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC
>Reporter: xingyuan cheng
>Priority: Major
>  Labels: pull-request-available
>
> At present, apache/flink-connector-jdbc does not support sub-database and 
> table sub-database. Now three commonly used databases Mysql, Postgres and 
> Oracle support sub-database and sub-table
>  
> Taking oracle as an example, users only need to configure the following 
> format to use
>  
> {code:java}
> create table oracle_source (
> EMPLOYEE_ID BIGINT,
> START_DATE TIMESTAMP,
> END_DATE TIMESTAMP,
> JOB_ID VARCHAR,
> DEPARTMENT_ID VARCHAR
> ) with (
> type = 'oracle',    
> url = 
> 'jdbc:oracle:thin:@//localhost:3306/order_([0-9]{1,}),jdbc:oracle:thin:@//localhost:3306/order_([0-9]{1,})',
>    userName = 'userName',
> password = 'password',
> dbName = 'hr', 
> table-name = 'order_([0-9]{1,})',
> timeField = 'START_DATE',
> startTime = '2007-1-1 00:00:00'
> ); {code}
> In the above code, the dbName attribute corresponds to the schema-name 
> attribute in oracle or postgres, and the mysql database needs to manually 
> specify the dbName
>  
> At the same time, I am also developing the CDAS whole database 
> synchronization syntax for the company, and the data source supports 
> sub-database and table as part of it. Add unit tests. For now, please keep 
> this PR in draft status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] cy2008 commented on pull request #22498: [FLINK-31928][build] Upgrade okhttp3 to 4.11.0

2023-05-04 Thread via GitHub


cy2008 commented on PR #22498:
URL: https://github.com/apache/flink/pull/22498#issuecomment-1535684672

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


pgaref commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060


##
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java:
##
@@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo(
  */
 @Override
 public void fail(Throwable t) {
-processFail(t, true);
+processFail(t, true, Collections.emptyMap());

Review Comment:
   Again this is covered with the next PR: 
https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782
   
   But there are still corner cases part of the Deployment/Execution that are 
not easily covered, e.g., 
[deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608),
 
[triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104),
 
[UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378)
 
   
   The intuition here is that all failures are internal and could skip labeling 
-- but happy to discuss other thoughts?



##
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java:
##
@@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo(
  */
 @Override
 public void fail(Throwable t) {
-processFail(t, true);
+processFail(t, true, Collections.emptyMap());

Review Comment:
   Again this is covered with the next PR: 
https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782
   
   But there are still corner cases part of the Deployment/Execution that are 
not easily covered, e.g., 
[deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608),
 
[triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104),
 
[UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378)
 
   
   The intuition here is that all these failures are internal and could skip 
labeling -- but happy to discuss other thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


pgaref commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060


##
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java:
##
@@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo(
  */
 @Override
 public void fail(Throwable t) {
-processFail(t, true);
+processFail(t, true, Collections.emptyMap());

Review Comment:
   Again this is covered with the next PR: 
https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782
   
   But there are still corner cases part of the Deployment/Execution that are 
not easily covered, e.g., 
[deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608),
 
[triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104),
 
[UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378)
 
   
   The intuition here is that all failures are internal and could skip labeling 
-- but happy to discuss alternatives



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


pgaref commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060


##
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java:
##
@@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo(
  */
 @Override
 public void fail(Throwable t) {
-processFail(t, true);
+processFail(t, true, Collections.emptyMap());

Review Comment:
   Again this is covered with the next PR: 
https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782
   
   But there are still corner cases part of the Deployment/Execution that are 
not easily covered, e.g., 
[deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608),
 
[triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104),
 
[UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378)
 
   
   Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


pgaref commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060


##
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java:
##
@@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo(
  */
 @Override
 public void fail(Throwable t) {
-processFail(t, true);
+processFail(t, true, Collections.emptyMap());

Review Comment:
   Again this is covered with the next PR: 
https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782
   
   But there are still corner cases part of the Deployment/Execution that are 
not easily covered, e.g., 
[deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608),
 
[triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104),
 
[UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378)
   
   Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-jdbc] reswqa commented on a diff in pull request #45: [hotfix] Fix incorrect sql_url in jdbc.yml.

2023-05-04 Thread via GitHub


reswqa commented on code in PR #45:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/45#discussion_r1185677895


##
docs/data/jdbc.yml:
##
@@ -18,4 +18,4 @@
 
 variants:
   - maven: flink-connector-jdbc
-sql_url: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-jdbc/$full_version/flink-sql-connector-jdbc-$full_version.jar
+sql_url: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-connector-jdbc/$full_version/flink-connector-jdbc-$full_version.jar

Review Comment:
   If `3.1.0` is the only version that support `flink-1.17`, there are two 
possibilities when releasing:
   
   - Create `v3.1` branch: let the `setup_docs.sh` tracks this branch.
   
   - Do not create `v3.1` branch: modify the doc in `jdbc-connector` 
repository, let `sql_connector_download_table "jdbc"` point to `3.1.0`.
   
   Both of these will automatically change the download url linked to 
`3.1.0-1.17`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


pgaref commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060


##
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java:
##
@@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo(
  */
 @Override
 public void fail(Throwable t) {
-processFail(t, true);
+processFail(t, true, Collections.emptyMap());

Review Comment:
   Again this is covered with the next PR: 
https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782
   
   But there are still corner cases that are not easily covered, e.g., 
[triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104)
  
[UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378)
   
   Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


pgaref commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060


##
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java:
##
@@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo(
  */
 @Override
 public void fail(Throwable t) {
-processFail(t, true);
+processFail(t, true, Collections.emptyMap());

Review Comment:
   Again this is covered with the next PR: 
https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782
   
   But still there are corner cases that are not easily covered, e.g., 
[triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104)
  
[UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378)
   
   Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


pgaref commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185672001


##
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java:
##
@@ -473,26 +500,50 @@ public CompletableFuture cancel(Time 
timeout) {
 @Override
 public CompletableFuture updateTaskExecutionState(
 final TaskExecutionState taskExecutionState) {
-FlinkException taskExecutionException;
+checkNotNull(taskExecutionState, "taskExecutionState");
+// Use the main/caller thread for all updates to make sure they are 
processed in order.
+// (MainThreadExecutor i.e., the akka thread pool does not guarantee 
that)
+// Only detach for a FAILED state update that is terminal and may 
perform io heavy labeling.
+if 
(ExecutionState.FAILED.equals(taskExecutionState.getExecutionState())) {
+return labelFailure(taskExecutionState)
+.thenApplyAsync(
+taskStateWithLabels -> {
+try {
+return 
doUpdateTaskExecutionState(taskStateWithLabels);
+} catch (FlinkException e) {
+throw new CompletionException(e);
+}
+},
+getMainThreadExecutor());
+}
 try {
-checkNotNull(taskExecutionState, "taskExecutionState");
+return CompletableFuture.completedFuture(
+doUpdateTaskExecutionState(taskExecutionState));
+} catch (FlinkException e) {
+return FutureUtils.completedExceptionally(e);
+}
+}
 
+private Acknowledge doUpdateTaskExecutionState(final TaskExecutionState 
taskExecutionState)
+throws FlinkException {
+@Nullable FlinkException taskExecutionException;
+try {
 if (schedulerNG.updateTaskExecutionState(taskExecutionState)) {

Review Comment:
   Btw, InternalFailuresListener#notifyTaskFailure is already covered as part 
of FLINK-31891 -- when a TM disconnection happens we need to Release Payload 
Slot and since the error is not fromSchedulerNg we use the 
internalTaskFailuresListener: 
https://github.com/apache/flink/pull/22511/files#diff-d535f910a10f835962b0637e12014068a9727b2152a84223fd9b1bf9c6c074d6R1665
   
   This is not however covering the `onMissingDeploymentsOf` case David 
mentioned above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


pgaref commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185667103


##
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ErrorInfo.java:
##
@@ -98,4 +112,13 @@ public String getExceptionAsString() {
 public long getTimestamp() {
 return timestamp;
 }
+
+/**
+ * Returns the labels associated with the exception.
+ *
+ * @return Map of exception labels
+ */
+public Map getLabels() {
+return labels;

Review Comment:
   Great idea, added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] reswqa commented on pull request #22447: [FLINK-31764][runtime] Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool

2023-05-04 Thread via GitHub


reswqa commented on PR #22447:
URL: https://github.com/apache/flink/pull/22447#issuecomment-1535639126

   Rebased on master to avoid the CI problem caused by 
[FLINK-30972](https://issues.apache.org/jira/browse/FLINK-30972).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32001) SupportsRowLevelUpdate does not support returning only a part of the columns.

2023-05-04 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719601#comment-17719601
 ] 

Jingsong Lee commented on FLINK-32001:
--

CC [~yuxia] 

> SupportsRowLevelUpdate does not support returning only a part of the columns.
> -
>
> Key: FLINK-32001
> URL: https://issues.apache.org/jira/browse/FLINK-32001
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Runtime
>Affects Versions: 1.17.0
>Reporter: Ming Li
>Priority: Major
>
> [FLIP-282|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235838061]
>  introduces the new Delete and Update API in Flink SQL. Although it is 
> described in the documentation that in case of {{partial-update}} we only 
> need to return the primary key columns and the updated columns.
> But in fact, the topology of the job  is {{{}source -> cal -> 
> constraintEnforcer -> sink{}}}, and the constraint check will be performed in 
> the operator of {{{}constraintEnforcer{}}}, which is done according to index, 
> not according to column. If only some columns are returned, the constraint 
> check is wrong, and it is easy to generate 
> {{{}ArrayIndexOutOfBoundsException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31968) [PyFlink] 1.17.0 version on M1 processor

2023-05-04 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719599#comment-17719599
 ] 

Dian Fu edited comment on FLINK-31968 at 5/5/23 2:27 AM:
-

[~gradysnik] Thanks for the confirmation. I'm closing this ticket since it 
should have been addressed in FLINK-28786. Feel free to reopen it if this issue 
still happens after 1.17.1.


was (Author: dianfu):
[~gradysnik] Thanks for the confirmation. I'm closing this ticket since it has 
been addressed in FLINK-28786. Feel free to reopen it if this issue still 
happens after 1.17.1.

> [PyFlink] 1.17.0 version on M1 processor
> 
>
> Key: FLINK-31968
> URL: https://issues.apache.org/jira/browse/FLINK-31968
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Yury Smirnov
>Priority: Major
> Attachments: 1170_output.log
>
>
> PyFlink version 1.17.0
> NumPy version 1.21.4 and 1.21.6
> Python versions 3.8, 3.9, 3.9
> Slack thread: 
> [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1682508650487979] 
> While running any of [PyFlink 
> Examples|https://github.com/apache/flink/blob/release-1.17/flink-python/pyflink/examples]
>  , getting error:
> {code:java}
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/bin/python 
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py 
> Traceback (most recent call last):
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 76, in 
>     basic_operations()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 53, in basic_operations
>     show(ds.map(update_tel), env)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 28, in show
>     env.execute()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/datastream/stream_execution_environment.py",
>  line 764, in execute
>     return 
> JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph))
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/java_gateway.py",
>  line 1322, in __call__
>     return_value = get_return_value(
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/util/exceptions.py",
>  line 146, in deco
>     return f(*a, **kw)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/protocol.py",
>  line 326, in get_return_value
>     raise Py4JJavaError(
> py4j.protocol.Py4JJavaError: An error occurred while calling o10.execute.
> : org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>     at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
>     at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
>     at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> 

[jira] [Commented] (FLINK-31968) [PyFlink] 1.17.0 version on M1 processor

2023-05-04 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719599#comment-17719599
 ] 

Dian Fu commented on FLINK-31968:
-

[~gradysnik] Thanks for the confirmation. I'm closing this ticket since it has 
been addressed in FLINK-28786. Feel free to reopen it if this issue still 
happens after 1.17.1.

> [PyFlink] 1.17.0 version on M1 processor
> 
>
> Key: FLINK-31968
> URL: https://issues.apache.org/jira/browse/FLINK-31968
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Yury Smirnov
>Priority: Major
> Attachments: 1170_output.log
>
>
> PyFlink version 1.17.0
> NumPy version 1.21.4 and 1.21.6
> Python versions 3.8, 3.9, 3.9
> Slack thread: 
> [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1682508650487979] 
> While running any of [PyFlink 
> Examples|https://github.com/apache/flink/blob/release-1.17/flink-python/pyflink/examples]
>  , getting error:
> {code:java}
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/bin/python 
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py 
> Traceback (most recent call last):
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 76, in 
>     basic_operations()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 53, in basic_operations
>     show(ds.map(update_tel), env)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 28, in show
>     env.execute()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/datastream/stream_execution_environment.py",
>  line 764, in execute
>     return 
> JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph))
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/java_gateway.py",
>  line 1322, in __call__
>     return_value = get_return_value(
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/util/exceptions.py",
>  line 146, in deco
>     return f(*a, **kw)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/protocol.py",
>  line 326, in get_return_value
>     raise Py4JJavaError(
> py4j.protocol.Py4JJavaError: An error occurred while calling o10.execute.
> : org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>     at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
>     at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
>     at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
>     at akka.dispatch.OnComplete.internal(Future.scala:300)
>     at akka.dispatch.OnComplete.internal(Future.scala:297)
>     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
>     at 

[jira] [Closed] (FLINK-31968) [PyFlink] 1.17.0 version on M1 processor

2023-05-04 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-31968.
---
Resolution: Duplicate

> [PyFlink] 1.17.0 version on M1 processor
> 
>
> Key: FLINK-31968
> URL: https://issues.apache.org/jira/browse/FLINK-31968
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Yury Smirnov
>Priority: Major
> Attachments: 1170_output.log
>
>
> PyFlink version 1.17.0
> NumPy version 1.21.4 and 1.21.6
> Python versions 3.8, 3.9, 3.9
> Slack thread: 
> [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1682508650487979] 
> While running any of [PyFlink 
> Examples|https://github.com/apache/flink/blob/release-1.17/flink-python/pyflink/examples]
>  , getting error:
> {code:java}
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/bin/python 
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py 
> Traceback (most recent call last):
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 76, in 
>     basic_operations()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 53, in basic_operations
>     show(ds.map(update_tel), env)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 28, in show
>     env.execute()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/datastream/stream_execution_environment.py",
>  line 764, in execute
>     return 
> JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph))
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/java_gateway.py",
>  line 1322, in __call__
>     return_value = get_return_value(
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/util/exceptions.py",
>  line 146, in deco
>     return f(*a, **kw)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/protocol.py",
>  line 326, in get_return_value
>     raise Py4JJavaError(
> py4j.protocol.Py4JJavaError: An error occurred while calling o10.execute.
> : org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>     at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
>     at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
>     at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
>     at akka.dispatch.OnComplete.internal(Future.scala:300)
>     at akka.dispatch.OnComplete.internal(Future.scala:297)
>     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
>     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
>     at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>     at 
> 

[jira] [Updated] (FLINK-31988) Implement Python wrapper for new KDS source

2023-05-04 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-31988:

Component/s: API / Python
 Connectors / Kinesis

> Implement Python wrapper for new KDS source
> ---
>
> Key: FLINK-31988
> URL: https://issues.apache.org/jira/browse/FLINK-31988
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Connectors / Kinesis
>Reporter: Hong Liang Teoh
>Priority: Major
>
> *What?*
> - Implement Python wrapper for KDS source
> - Write tests for this KDS source



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-connector-jdbc] leonardBang commented on a diff in pull request #45: [hotfix] Fix incorrect sql_url in jdbc.yml.

2023-05-04 Thread via GitHub


leonardBang commented on code in PR #45:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/45#discussion_r1185651232


##
docs/data/jdbc.yml:
##
@@ -18,4 +18,4 @@
 
 variants:
   - maven: flink-connector-jdbc
-sql_url: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-jdbc/$full_version/flink-sql-connector-jdbc-$full_version.jar
+sql_url: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-connector-jdbc/$full_version/flink-connector-jdbc-$full_version.jar

Review Comment:
   > At present, the `release-1.17` branch of flink tracks the `v3.0` branch of 
the `flink-connnector-jdbc` repository.
   
   IIUC, `3.0.0-1.17` never released and looks we don't have plan to release  
this version, 
https://mvnrepository.com/artifact/org.apache.flink/flink-connector-jdbc



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] cy2008 commented on pull request #22498: [FLINK-31928][build] Upgrade okhttp3 to 4.11.0

2023-05-04 Thread via GitHub


cy2008 commented on PR #22498:
URL: https://github.com/apache/flink/pull/22498#issuecomment-1535613125

   > @flinkbot run azure
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] libenchao commented on a diff in pull request #10594: [FLINK-15266] Fix NPE for case operator code gen in blink planner

2023-05-04 Thread via GitHub


libenchao commented on code in PR #10594:
URL: https://github.com/apache/flink/pull/10594#discussion_r1185649952


##
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/calls/ScalarOperatorGens.scala:
##
@@ -1287,12 +1287,16 @@ object ScalarOperatorGens {
|boolean $nullTerm;
|if (${condition.resultTerm}) {
|  ${trueAction.code}
-   |  $resultTerm = ${trueAction.resultTerm};
+   |  if (!${trueAction.nullTerm}) {

Review Comment:
   @kurnoolsaketh We didn't solve it, it just disappeared.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] cy2008 commented on pull request #22498: [FLINK-31928][build] Upgrade okhttp3 to 4.11.0

2023-05-04 Thread via GitHub


cy2008 commented on PR #22498:
URL: https://github.com/apache/flink/pull/22498#issuecomment-1535612435

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] zhipeng93 opened a new pull request, #237: [Flink-27826] Support machine learning training for very high dimesional models

2023-05-04 Thread via GitHub


zhipeng93 opened a new pull request, #237:
URL: https://github.com/apache/flink-ml/pull/237

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request adds the implementation of Naive Bayes 
algorithm.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Adds Transformer and Estimator implementation of Naive Bayes in Java 
and Python*
 - *Adds examples and documentations of Naive Bayes*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] wgtmac commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

2023-05-04 Thread via GitHub


wgtmac commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535588804

   Thanks @dongjoon-hyun and @pgaref. I will keep an eye on it and prepare RC2 
if required.
   
   BTW, it would be better to test 1.7.9-RC1 instead of 1.7.9-SNAPSHOT. But I 
think they have the same content except the version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31974) JobManager crashes after KubernetesClientException exception with FatalExitExceptionHandler

2023-05-04 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719580#comment-17719580
 ] 

Xintong Song commented on FLINK-31974:
--

cc [~wangyang0918]

> JobManager crashes after KubernetesClientException exception with 
> FatalExitExceptionHandler
> ---
>
> Key: FLINK-31974
> URL: https://issues.apache.org/jira/browse/FLINK-31974
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.17.0
>Reporter: Sergio Sainz
>Assignee: Weijie Guo
>Priority: Major
>
> When resource quota limit is reached JobManager will throw
>  
> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: POST at: 
> https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-2" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>  
> In {*}1.16.1 , this is handled gracefully{*}:
> {code}
> 2023-04-28 22:07:24,631 WARN  
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
> Failed requesting worker with resource spec WorkerResourceSpec 
> \{cpuCores=1.0, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb 
> (241591914 bytes), numSlots=4}, current pending count: 0
> java.util.concurrent.CompletionException: 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-138" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>         at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> ~[?:?]
>         at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. 
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods 
> "my-namespace-flink-cluster-taskmanager-1-138" is forbidden: exceeded quota: 
> my-namespace-resource-quota, requested: limits.cpu=3, used: 
> limits.cpu=12100m, limited: limits.cpu=13.
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:684)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:664)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:613)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:308)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:644)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:83)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:61)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$createTaskManagerPod$1(Fabric8FlinkKubeClient.java:163)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         ... 4 more
> {code}
> But , {*}in Flink 1.17.0 , Job Manager crashes{*}:
> {code}
> 2023-04-28 20:50:50,534 ERROR org.apache.flink.util.FatalExitExceptionHandler 
>              [] - FATAL: Thread 

[GitHub] [flink] snuyanzin commented on pull request #17830: [FLINK-24893][Table SQL/Client][FLIP-189] SQL Client prompts customization

2023-05-04 Thread via GitHub


snuyanzin commented on PR #17830:
URL: https://github.com/apache/flink/pull/17830#issuecomment-1535509074

   Since I rebased to the latest sql-gateway and sql-client versions
   @fsk119 may I ask you to have another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #22523: [FLINK-32006] Harden and speed up AsyncWaitOperatorTest#testProcessingTimeAlwaysTimeoutFunctionWithRetry

2023-05-04 Thread via GitHub


flinkbot commented on PR #22523:
URL: https://github.com/apache/flink/pull/22523#issuecomment-1535508362

   
   ## CI report:
   
   * 991f0249b767b51481e3ad04add3cb8604d9d38f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-32006) AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry times out on Azure

2023-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-32006:
---
Labels: pull-request-available test-stability  (was: test-stability)

> AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry 
> times out on Azure
> --
>
> Key: FLINK-32006
> URL: https://issues.apache.org/jira/browse/FLINK-32006
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.18.0
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> {code:java}
> May 04 13:52:18 [ERROR] 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry
>   Time elapsed: 100.009 s  <<< ERROR!
> May 04 13:52:18 org.junit.runners.model.TestTimedOutException: test timed out 
> after 100 seconds
> May 04 13:52:18   at java.lang.Thread.sleep(Native Method)
> May 04 13:52:18   at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeAlwaysTimeoutFunctionWithRetry(AsyncWaitOperatorTest.java:1313)
> May 04 13:52:18   at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry(AsyncWaitOperatorTest.java:1277)
> May 04 13:52:18   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 04 13:52:18   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 04 13:52:18   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 04 13:52:18   at java.lang.reflect.Method.invoke(Method.java:498)
> May 04 13:52:18   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> May 04 13:52:18   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 04 13:52:18   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 04 13:52:18   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> May 04 13:52:18   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> May 04 13:52:18   at java.lang.Thread.run(Thread.java:748)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48671=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32006) AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry times out on Azure

2023-05-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-32006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek reassigned FLINK-32006:
-

Assignee: David Morávek

> AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry 
> times out on Azure
> --
>
> Key: FLINK-32006
> URL: https://issues.apache.org/jira/browse/FLINK-32006
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.18.0
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Major
>  Labels: test-stability
>
> {code:java}
> May 04 13:52:18 [ERROR] 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry
>   Time elapsed: 100.009 s  <<< ERROR!
> May 04 13:52:18 org.junit.runners.model.TestTimedOutException: test timed out 
> after 100 seconds
> May 04 13:52:18   at java.lang.Thread.sleep(Native Method)
> May 04 13:52:18   at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeAlwaysTimeoutFunctionWithRetry(AsyncWaitOperatorTest.java:1313)
> May 04 13:52:18   at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry(AsyncWaitOperatorTest.java:1277)
> May 04 13:52:18   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 04 13:52:18   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 04 13:52:18   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 04 13:52:18   at java.lang.reflect.Method.invoke(Method.java:498)
> May 04 13:52:18   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> May 04 13:52:18   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 04 13:52:18   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 04 13:52:18   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> May 04 13:52:18   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> May 04 13:52:18   at java.lang.Thread.run(Thread.java:748)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48671=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] dmvk opened a new pull request, #22523: [FLINK-32006] Harden and speed up AsyncWaitOperatorTest#testProcessingTimeAlwaysTimeoutFunctionWithRetry

2023-05-04 Thread via GitHub


dmvk opened a new pull request, #22523:
URL: https://github.com/apache/flink/pull/22523

   https://issues.apache.org/jira/browse/FLINK-32006


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] kurnoolsaketh commented on a diff in pull request #10594: [FLINK-15266] Fix NPE for case operator code gen in blink planner

2023-05-04 Thread via GitHub


kurnoolsaketh commented on code in PR #10594:
URL: https://github.com/apache/flink/pull/10594#discussion_r1185567460


##
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/calls/ScalarOperatorGens.scala:
##
@@ -1287,12 +1287,16 @@ object ScalarOperatorGens {
|boolean $nullTerm;
|if (${condition.resultTerm}) {
|  ${trueAction.code}
-   |  $resultTerm = ${trueAction.resultTerm};
+   |  if (!${trueAction.nullTerm}) {

Review Comment:
   @libenchao @JingsongLi 
   
   Hello! This might be a bit unrelated, but I am developing on flink and I am 
also running into a very similar `VerifyError` pointing to a Janino code 
generated method (specifically in the Flink protobuf row serialization 
codepath). How did you workaround/resolve this issue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] liuml07 commented on pull request #22522: [BP-1.17][FLINK-31984][fs][checkpoint] Savepoint should be relocatable if entropy injection is not effective

2023-05-04 Thread via GitHub


liuml07 commented on PR #22522:
URL: https://github.com/apache/flink/pull/22522#issuecomment-1535413955

   Tried locally and it can be cherry-picked to `release-1.16` branch cleanly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #22522: [BP-1.17][FLINK-31984][fs][checkpoint] Savepoint should be relocatable if entropy injection is not effective

2023-05-04 Thread via GitHub


flinkbot commented on PR #22522:
URL: https://github.com/apache/flink/pull/22522#issuecomment-1535413802

   
   ## CI report:
   
   * ed6a6d56af6b23282a45a717b98f126017ab259a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] liuml07 opened a new pull request, #22522: [FLINK-31984][fs][checkpoint] Savepoint should be relocatable if entropy injection is not effective

2023-05-04 Thread via GitHub


liuml07 opened a new pull request, #22522:
URL: https://github.com/apache/flink/pull/22522

   This is backporting FLINK-31984 aka #22510 to 1.17 branch.
   
   The original backport 5c0a53534ed was reverted by #22518 because API change 
causes [build 
failures](https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48655=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=1316).
   
   As discussed in 
[FLINK-31984](https://issues.apache.org/jira/browse/FLINK-31984) we preserve 
the old public evolving API and introduce a new one. The old API is unlikely 
used elsewhere and now is `@Deprecated`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31974) JobManager crashes after KubernetesClientException exception with FatalExitExceptionHandler

2023-05-04 Thread Sergio Sainz (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719525#comment-17719525
 ] 

Sergio Sainz commented on FLINK-31974:
--

Hi [~mapohl] - let me setup a new cluster later on to get the full logs. Below 
please find the thread dump from the Flink 1.17.0 crash:

 
{code:java}
2023-04-28 20:50:50,305 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] 
- Received resource requirements from job 0a97c80a173b7ebb619c5b030b607520: 
[ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=1}]
...
2023-04-28 20:50:50,534 ERROR org.apache.flink.util.FatalExitExceptionHandler   
           [] - FATAL: Thread 'flink-akka.actor.default-dispatcher-15' produced 
an uncaught exception. Stopping the process...
java.util.concurrent.CompletionException: 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
 Failure executing: POST at: 
https://10.96.0.1/api/v1/namespaces/env-my-namespace/pods. Message: 
Forbidden!Configured service account doesn't have access. Service account may 
have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-2" is 
forbidden: exceeded quota: my-namespace-realtime-server-resource-quota, 
requested: limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
        at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown Source) 
~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
~[?:?]
        at java.lang.Thread.run(Unknown Source) ~[?:?]
Caused by: 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
 Failure executing: POST at: 
https://10.96.0.1/api/v1/namespaces/env-my-namespace/pods. Message: 
Forbidden!Configured service account doesn't have access. Service account may 
have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-2" is 
forbidden: exceeded quota: my-namespace-realtime-server-resource-quota, 
requested: limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:684)
 ~[flink-dist-1.17.0.jar:1.17.0]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:664)
 ~[flink-dist-1.17.0.jar:1.17.0]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:613)
 ~[flink-dist-1.17.0.jar:1.17.0]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558)
 ~[flink-dist-1.17.0.jar:1.17.0]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
 ~[flink-dist-1.17.0.jar:1.17.0]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:308)
 ~[flink-dist-1.17.0.jar:1.17.0]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:644)
 ~[flink-dist-1.17.0.jar:1.17.0]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:83)
 ~[flink-dist-1.17.0.jar:1.17.0]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.base.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:61)
 ~[flink-dist-1.17.0.jar:1.17.0]
        at 
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$createTaskManagerPod$1(Fabric8FlinkKubeClient.java:163)
 ~[flink-dist-1.17.0.jar:1.17.0]
        ... 4 more
2023-04-28 20:50:50,602 ERROR org.apache.flink.util.FatalExitExceptionHandler   
           [] - Thread dump: 
"main" prio=5 Id=1 WAITING on 
java.util.concurrent.CompletableFuture$Signaller@2897b146
        at java.base@11.0.19/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.CompletableFuture$Signaller@2897b146
        at 
java.base@11.0.19/java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at 
java.base@11.0.19/java.util.concurrent.CompletableFuture$Signaller.block(Unknown
 Source)
        at 
java.base@11.0.19/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)
        at 
java.base@11.0.19/java.util.concurrent.CompletableFuture.waitingGet(Unknown 
Source)
        at 

[GitHub] [flink] nateab commented on pull request #22359: [FLINK-31660][table] add jayway json path dependency to table-planner

2023-05-04 Thread via GitHub


nateab commented on PR #22359:
URL: https://github.com/apache/flink/pull/22359#issuecomment-1535370028

   @zentol i believe i've addressed all your feedback, and I got CI to be pass 
again, so if you could take a final look hopefully 爛 And thanks for all your 
help reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-32007) Implement Python Wrappers for DynamoDB Connector

2023-05-04 Thread Ahmed Hamdy (Jira)
Ahmed Hamdy created FLINK-32007:
---

 Summary: Implement Python Wrappers for DynamoDB Connector
 Key: FLINK-32007
 URL: https://issues.apache.org/jira/browse/FLINK-32007
 Project: Flink
  Issue Type: New Feature
  Components: API / Python, Connectors / DynamoDB
Reporter: Ahmed Hamdy
 Fix For: 1.18.0


Implement Python API Wrappers for DynamoDB Sink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] architgyl commented on a diff in pull request #22509: [FLINK-31983] Add yarn Acls capability to Flink containers

2023-05-04 Thread via GitHub


architgyl commented on code in PR #22509:
URL: https://github.com/apache/flink/pull/22509#discussion_r1185469043


##
flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java:
##
@@ -61,6 +61,9 @@
 class YARNSessionFIFOITCase extends YarnTestBase {
 private static final Logger log = 
LoggerFactory.getLogger(YARNSessionFIFOITCase.class);
 
+protected static final String VIEW_ACLS = "user groupUser";
+protected static final String MODIFY_ACLS = "admin groupAdmin";

Review Comment:
   sure, changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

2023-05-04 Thread via GitHub


pgaref commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535349716

   hey @dongjoon-hyun  -- thanks for keeping an eye!
   Just triggered a run on[ 
1.7.9-SNAPSHOT](https://github.com/apache/flink/pull/22481/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R184)
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-aws] vahmed-hamdy commented on a diff in pull request #70: [FLINK-31772] Adjusting Kinesis Ratelimiting strategy to fix performance regression

2023-05-04 Thread via GitHub


vahmed-hamdy commented on code in PR #70:
URL: 
https://github.com/apache/flink-connector-aws/pull/70#discussion_r1185454980


##
flink-connector-aws-kinesis-streams/src/test/java/org/apache/flink/connector/kinesis/sink/KinesisStreamsSinkWriterTest.java:
##
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.kinesis.sink;
+
+import org.apache.flink.api.common.serialization.SimpleStringSchema;
+import org.apache.flink.connector.aws.testutils.AWSServicesTestUtils;
+import org.apache.flink.connector.base.sink.writer.ElementConverter;
+import org.apache.flink.connector.base.sink.writer.TestSinkInitContext;
+import 
org.apache.flink.connector.base.sink.writer.strategy.AIMDScalingStrategy;
+
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.mockito.Mockito;

Review Comment:
   Thanks @hlteoh37 ! I agree as per coding guidelines.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-32006) AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry times out on Azure

2023-05-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-32006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek updated FLINK-32006:
--
Labels: test-stability  (was: )

> AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry 
> times out on Azure
> --
>
> Key: FLINK-32006
> URL: https://issues.apache.org/jira/browse/FLINK-32006
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.18.0
>Reporter: David Morávek
>Priority: Major
>  Labels: test-stability
>
> {code:java}
> May 04 13:52:18 [ERROR] 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry
>   Time elapsed: 100.009 s  <<< ERROR!
> May 04 13:52:18 org.junit.runners.model.TestTimedOutException: test timed out 
> after 100 seconds
> May 04 13:52:18   at java.lang.Thread.sleep(Native Method)
> May 04 13:52:18   at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeAlwaysTimeoutFunctionWithRetry(AsyncWaitOperatorTest.java:1313)
> May 04 13:52:18   at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry(AsyncWaitOperatorTest.java:1277)
> May 04 13:52:18   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 04 13:52:18   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 04 13:52:18   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 04 13:52:18   at java.lang.reflect.Method.invoke(Method.java:498)
> May 04 13:52:18   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> May 04 13:52:18   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 04 13:52:18   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 04 13:52:18   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> May 04 13:52:18   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> May 04 13:52:18   at java.lang.Thread.run(Thread.java:748)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48671=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32006) AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry times out on Azure

2023-05-04 Thread Jira
David Morávek created FLINK-32006:
-

 Summary: 
AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry 
times out on Azure
 Key: FLINK-32006
 URL: https://issues.apache.org/jira/browse/FLINK-32006
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.18.0
Reporter: David Morávek


{code:java}
May 04 13:52:18 [ERROR] 
org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry
  Time elapsed: 100.009 s  <<< ERROR!
May 04 13:52:18 org.junit.runners.model.TestTimedOutException: test timed out 
after 100 seconds
May 04 13:52:18 at java.lang.Thread.sleep(Native Method)
May 04 13:52:18 at 
org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeAlwaysTimeoutFunctionWithRetry(AsyncWaitOperatorTest.java:1313)
May 04 13:52:18 at 
org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry(AsyncWaitOperatorTest.java:1277)
May 04 13:52:18 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
May 04 13:52:18 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
May 04 13:52:18 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
May 04 13:52:18 at java.lang.reflect.Method.invoke(Method.java:498)
May 04 13:52:18 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
May 04 13:52:18 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
May 04 13:52:18 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
May 04 13:52:18 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
May 04 13:52:18 at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
May 04 13:52:18 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
May 04 13:52:18 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
May 04 13:52:18 at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
May 04 13:52:18 at java.lang.Thread.run(Thread.java:748)
 {code}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48671=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] dmvk commented on pull request #22521: [FLINK-32000][webui] Expose max parallelism in the vertex detail drawer.

2023-05-04 Thread via GitHub


dmvk commented on PR #22521:
URL: https://github.com/apache/flink/pull/22521#issuecomment-1535282266

   failed with https://issues.apache.org/jira/browse/FLINK-32006


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] dmvk commented on pull request #22521: [FLINK-32000][webui] Expose max parallelism in the vertex detail drawer.

2023-05-04 Thread via GitHub


dmvk commented on PR #22521:
URL: https://github.com/apache/flink/pull/22521#issuecomment-1535281949

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] mridulm commented on a diff in pull request #22509: [FLINK-31983] Add yarn Acls capability to Flink containers

2023-05-04 Thread via GitHub


mridulm commented on code in PR #22509:
URL: https://github.com/apache/flink/pull/22509#discussion_r1185420037


##
flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java:
##
@@ -61,6 +61,9 @@
 class YARNSessionFIFOITCase extends YarnTestBase {
 private static final Logger log = 
LoggerFactory.getLogger(YARNSessionFIFOITCase.class);
 
+protected static final String VIEW_ACLS = "user groupUser";
+protected static final String MODIFY_ACLS = "admin groupAdmin";

Review Comment:
   `groupUser` threw me off - `group` would be better :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] liuml07 commented on pull request #20935: [FLINK-29961][doc] Make referencing custom image clearer for Docker

2023-05-04 Thread via GitHub


liuml07 commented on PR #20935:
URL: https://github.com/apache/flink/pull/20935#issuecomment-1535210299

   Hi @MartijnVisser could you take another look?  Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-32005) Add a per-deployment error metric to signal about potential issues

2023-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-32005:
---
Labels: pull-request-available  (was: )

> Add a per-deployment error metric to signal about potential issues
> --
>
> Key: FLINK-32005
> URL: https://issues.apache.org/jira/browse/FLINK-32005
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler, Kubernetes Operator
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.5.0
>
>
> If any of the autoscaled deployment produce errors they are only visible in 
> the logs or in the k8s events. Additionally, it would be good to have metrics 
> to detect any potential issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-kubernetes-operator] mxm opened a new pull request, #585: [FLINK-32005] Add a per-deployment error metric to signal about potential issues

2023-05-04 Thread via GitHub


mxm opened a new pull request, #585:
URL: https://github.com/apache/flink-kubernetes-operator/pull/585

   If any of the autoscaled deployment produce errors they are only visible in 
the logs or in the k8s events. Additionally, it would be good to have metrics 
to detect any potential issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-32005) Add a per-deployment error metric to signal about potential issues

2023-05-04 Thread Maximilian Michels (Jira)
Maximilian Michels created FLINK-32005:
--

 Summary: Add a per-deployment error metric to signal about 
potential issues
 Key: FLINK-32005
 URL: https://issues.apache.org/jira/browse/FLINK-32005
 Project: Flink
  Issue Type: Improvement
  Components: Autoscaler, Kubernetes Operator
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: kubernetes-operator-1.5.0


If any of the autoscaled deployment produce errors they are only visible in the 
logs or in the k8s events. Additionally, it would be good to have metrics to 
detect any potential issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31867) Enforce a minimum number of observations within a metric window

2023-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31867:
---
Labels: pull-request-available  (was: )

> Enforce a minimum number of observations within a metric window
> ---
>
> Key: FLINK-31867
> URL: https://issues.apache.org/jira/browse/FLINK-31867
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler, Kubernetes Operator
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.5.0
>
>
> The metric window is currently only time-based. We should make sure we see a 
> minimum number of observations to ensure we don't decide based on too few 
> observations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-kubernetes-operator] mxm opened a new pull request, #584: [FLINK-31867] Enforce a minimum number of observations within a metric window

2023-05-04 Thread via GitHub


mxm opened a new pull request, #584:
URL: https://github.com/apache/flink-kubernetes-operator/pull/584

   This makes sure that a minimum number of observations have been taken within 
the metric window. Otherwise the metrics won't be returned to prevent decision 
making on too few observations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-32004) Intermittent ingress creation when JobManager is restarted via autoscaler

2023-05-04 Thread Tan Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tan Kim updated FLINK-32004:

Description: 
I set up access to the flink Web UI via the ingress settings in flinkDeployment.

Since this is an AWS environment, the ingress class uses ALB, not NGINX.

When using aws alb with flinkDeployment ingress settings when JobManager is 
restarted via autoscaler, intermittently ingress creation fails.

I say intermittent because ingress is generated irregularly, regardless of 
whether scaling is successful or not.

 

This issue occurs when the initial deployment always succeeds in creating 
INGRESS and accessing the FLINK WEB UI, but when it is scaled up/down via the 
autoscaler, INGRESS is not created sometimes.

  was:
I set up access to the flink Web UI via the ingress settings in flinkDeployment.

Since this is an AWS environment, the ingress class uses ALB, not NGINX.

When using aws alb with flinkDeployment ingress settings when JobManager is 
restarted via autoscaler, intermittently ingress creation fails.

I say intermittent because ingress is generated irregularly, regardless of 
whether scaling is successful or not.

 

This issue occurs when the initial deployment always succeeds in creating 
INGRESS and accessing the FLINK WEB UI, but when it is scaled up/down via the 
autoscaler, INGRESS is not created.


> Intermittent ingress creation when JobManager is restarted via autoscaler
> -
>
> Key: FLINK-32004
> URL: https://issues.apache.org/jira/browse/FLINK-32004
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler, Kubernetes Operator
>Reporter: Tan Kim
>Priority: Major
>
> I set up access to the flink Web UI via the ingress settings in 
> flinkDeployment.
> Since this is an AWS environment, the ingress class uses ALB, not NGINX.
> When using aws alb with flinkDeployment ingress settings when JobManager is 
> restarted via autoscaler, intermittently ingress creation fails.
> I say intermittent because ingress is generated irregularly, regardless of 
> whether scaling is successful or not.
>  
> This issue occurs when the initial deployment always succeeds in creating 
> INGRESS and accessing the FLINK WEB UI, but when it is scaled up/down via the 
> autoscaler, INGRESS is not created sometimes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32004) Intermittent ingress creation when JobManager is restarted via autoscaler

2023-05-04 Thread Tan Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tan Kim updated FLINK-32004:

Description: 
I set up access to the flink Web UI via the ingress settings in flinkDeployment.

Since this is an AWS environment, the ingress class uses ALB, not NGINX.

When using aws alb with flinkDeployment ingress settings when JobManager is 
restarted via autoscaler, intermittently ingress creation fails.

I say intermittent because ingress is generated irregularly, regardless of 
whether scaling is successful or not.

 

This issue occurs when the initial deployment always succeeds in creating 
INGRESS and accessing the FLINK WEB UI, but when it is scaled up/down via the 
autoscaler, INGRESS is not created.

  was:
I set up access to the flink Web UI via the ingress settings in flinkDeployment.

Since this is an AWS environment, the ingress class uses ALB, not NGINX.

When using aws alb with flinkDeployment ingress settings when JobManager is 
restarted via autoscaler, intermittently ingress creation fails.

I say intermittent because ingress is generated irregularly, regardless of 
whether scaling is successful or not.


> Intermittent ingress creation when JobManager is restarted via autoscaler
> -
>
> Key: FLINK-32004
> URL: https://issues.apache.org/jira/browse/FLINK-32004
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler, Kubernetes Operator
>Reporter: Tan Kim
>Priority: Major
>
> I set up access to the flink Web UI via the ingress settings in 
> flinkDeployment.
> Since this is an AWS environment, the ingress class uses ALB, not NGINX.
> When using aws alb with flinkDeployment ingress settings when JobManager is 
> restarted via autoscaler, intermittently ingress creation fails.
> I say intermittent because ingress is generated irregularly, regardless of 
> whether scaling is successful or not.
>  
> This issue occurs when the initial deployment always succeeds in creating 
> INGRESS and accessing the FLINK WEB UI, but when it is scaled up/down via the 
> autoscaler, INGRESS is not created.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32004) Intermittent ingress creation when JobManager is restarted via autoscaler

2023-05-04 Thread Tan Kim (Jira)
Tan Kim created FLINK-32004:
---

 Summary: Intermittent ingress creation when JobManager is 
restarted via autoscaler
 Key: FLINK-32004
 URL: https://issues.apache.org/jira/browse/FLINK-32004
 Project: Flink
  Issue Type: Bug
  Components: Autoscaler, Kubernetes Operator
Reporter: Tan Kim


I set up access to the flink Web UI via the ingress settings in flinkDeployment.

Since this is an AWS environment, the ingress class uses ALB, not NGINX.

When using aws alb with flinkDeployment ingress settings when JobManager is 
restarted via autoscaler, intermittently ingress creation fails.

I say intermittent because ingress is generated irregularly, regardless of 
whether scaling is successful or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31977) If scaling.effectiveness.detection.enabled is false, the call to the detectIneffectiveScaleUp() function is unnecessary

2023-05-04 Thread Tan Kim (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719423#comment-17719423
 ] 

Tan Kim commented on FLINK-31977:
-

I understand what you're saying.
However, there's a part of the code that's a bit hard to understand because it 
has multiple returns based on conditional statements.
I think I can change it to something more intuitive like this, what do you 
think?

This is unrelated to the original suggestion in the jira ticket to improve 
inefficient function calls per SCALING_EFFECTIVENESS_DETECTION_ENABLED, but it 
does make the code a little easier to understand.
{code:java}
private boolean detectIneffectiveScaleUp(
AbstractFlinkResource resource,
JobVertexID vertex,
Configuration conf,
Map evaluatedMetrics,
ScalingSummary lastSummary) {

double lastProcRate = 
lastSummary.getMetrics().get(TRUE_PROCESSING_RATE).getAverage();
double lastExpectedProcRate =
lastSummary.getMetrics().get(EXPECTED_PROCESSING_RATE).getCurrent();
var currentProcRate = 
evaluatedMetrics.get(TRUE_PROCESSING_RATE).getAverage();

// To judge the effectiveness of the scale up operation we compute how much 
of the expected
// increase actually happened. For example if we expect a 100 increase in 
proc rate and only
// got an increase of 10 we only accomplished 10% of the desired increase. 
If this number is
// below the threshold, we mark the scaling ineffective.
double expectedIncrease = lastExpectedProcRate - lastProcRate;
double actualIncrease = currentProcRate - lastProcRate;

boolean isInEffectiveScaleUp =
(actualIncrease / expectedIncrease)
< 
conf.get(AutoScalerOptions.SCALING_EFFECTIVENESS_THRESHOLD);

if (isInEffectiveScaleUp) {
var message = String.format(INNEFFECTIVE_MESSAGE_FORMAT, vertex);

eventRecorder.triggerEvent(
resource,
EventRecorder.Type.Normal,
EventRecorder.Reason.IneffectiveScaling,
EventRecorder.Component.Operator,
message);

if 
(conf.get(AutoScalerOptions.SCALING_EFFECTIVENESS_DETECTION_ENABLED)) {
LOG.info(
"Ineffective scaling detected for {}, expected increase {}, 
actual {}",
vertex,
expectedIncrease,
actualIncrease);
return true;
}
}
return false;
} {code}

> If scaling.effectiveness.detection.enabled is false, the call to the 
> detectIneffectiveScaleUp() function is unnecessary
> ---
>
> Key: FLINK-31977
> URL: https://issues.apache.org/jira/browse/FLINK-31977
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler
>Affects Versions: 1.17.0
>Reporter: Tan Kim
>Priority: Minor
>
> The code below is a function to detect inefficient scaleups.
> It returns a result if the value of SCALING_EFFECTIVENESS_DETECTION_ENABLED 
> (scaling.effectiveness.detection.enabled) is true after all the necessary 
> computations for detection, but this is an unnecessary computation.
> {code:java}
> JobVertexScaler.java #175
> private boolean detectIneffectiveScaleUp(
> AbstractFlinkResource resource,
> JobVertexID vertex,
> Configuration conf,
> Map evaluatedMetrics,
> ScalingSummary lastSummary) {
> double lastProcRate = 
> lastSummary.getMetrics().get(TRUE_PROCESSING_RATE).getAverage(); // 
> 22569.315633422066
> double lastExpectedProcRate =
> 
> lastSummary.getMetrics().get(EXPECTED_PROCESSING_RATE).getCurrent(); // 
> 37340.0
> var currentProcRate = 
> evaluatedMetrics.get(TRUE_PROCESSING_RATE).getAverage();
> // To judge the effectiveness of the scale up operation we compute how 
> much of the expected
> // increase actually happened. For example if we expect a 100 increase in 
> proc rate and only
> // got an increase of 10 we only accomplished 10% of the desired 
> increase. If this number is
> // below the threshold, we mark the scaling ineffective.
> double expectedIncrease = lastExpectedProcRate - lastProcRate;
> double actualIncrease = currentProcRate - lastProcRate;
> boolean withinEffectiveThreshold =
> (actualIncrease / expectedIncrease)
> >= 
> conf.get(AutoScalerOptions.SCALING_EFFECTIVENESS_THRESHOLD);
> if (withinEffectiveThreshold) {
> return false;
> }
> var message = String.format(INNEFFECTIVE_MESSAGE_FORMAT, vertex);
> eventRecorder.triggerEvent(
> resource,
> EventRecorder.Type.Normal,
> EventRecorder.Reason.IneffectiveScaling,
> 

[GitHub] [flink] dongjoon-hyun commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

2023-05-04 Thread via GitHub


dongjoon-hyun commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535071204

   If we need some patches in order to help this PR, Apache ORC community can 
cut RC2 accordingly to help Apache Flink community.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] dongjoon-hyun commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

2023-05-04 Thread via GitHub


dongjoon-hyun commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535069827

   BTW, @pgaref . As you know, @wgtmac started Apache ORC 1.7.9 RC1. Can we 
test `RC1` in this PR?
   - https://github.com/apache/orc/issues/1437


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

2023-05-04 Thread via GitHub


pgaref commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535056878

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] reswqa commented on pull request #22447: [FLINK-31764][runtime] Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool

2023-05-04 Thread via GitHub


reswqa commented on PR #22447:
URL: https://github.com/apache/flink/pull/22447#issuecomment-1535049629

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-32003) Release 3.0.0-1.16 and 1.16.1 doesn't work with OAuth2

2023-05-04 Thread Neng Lu (Jira)
Neng Lu created FLINK-32003:
---

 Summary: Release 3.0.0-1.16 and 1.16.1 doesn't work with OAuth2
 Key: FLINK-32003
 URL: https://issues.apache.org/jira/browse/FLINK-32003
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Pulsar
Reporter: Neng Lu


The release for 3.0.0-1.16 and 1.16.1 depends on Pulsar client version 2.10.1.

There is an issue using OAuth2 with this client version which results in the 
following error.


{code:java}
Exception in thread "main" java.lang.RuntimeException: 
org.apache.pulsar.client.admin.PulsarAdminException: 
java.util.concurrent.CompletionException: 
org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException: 
Could not complete the operation. Number of retries has been exhausted. Failed 
reason: 
https://func-test-31a67160-533f-4a5f-81a8-30b6221f34a9.gcp-shared-gcp-usce1-martin.streamnative.g.snio.cloud:443
at me.nlu.pulsar.PulsarAdminTester.main(PulsarAdminTester.java:56)
Caused by: org.apache.pulsar.client.admin.PulsarAdminException: 
java.util.concurrent.CompletionException: 
org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException: 
Could not complete the operation. Number of retries has been exhausted. Failed 
reason: 
https://func-test-31a67160-533f-4a5f-81a8-30b6221f34a9.gcp-shared-gcp-usce1-martin.streamnative.g.snio.cloud:443
at 
org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:251)
at 
org.apache.pulsar.client.admin.internal.TopicsImpl$1.failed(TopicsImpl.java:187)
at 
org.apache.pulsar.shade.org.glassfish.jersey.client.JerseyInvocation$1.failed(JerseyInvocation.java:882)
at 
org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime.processFailure(ClientRuntime.java:247)
at 
org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime.processFailure(ClientRuntime.java:242)
at 
org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime.access$100(ClientRuntime.java:62)
at 
org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime$2.lambda$failure$1(ClientRuntime.java:178)
at 
org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at 
org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at 
org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at 
org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at 
org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at 
org.apache.pulsar.shade.org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:288)
at 
org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime$2.failure(ClientRuntime.java:178)
at 
org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$apply$1(AsyncHttpConnector.java:218)
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at 
org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$4(AsyncHttpConnector.java:277)
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at 
org.apache.pulsar.shade.org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:273)
at 
org.apache.pulsar.shade.org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:181)
at 
org.apache.pulsar.shade.org.asynchttpclient.netty.channel.NettyConnectListener$1.onFailure(NettyConnectListener.java:151)
at 
org.apache.pulsar.shade.org.asynchttpclient.netty.SimpleFutureListener.operationComplete(SimpleFutureListener.java:26)
at 
org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at 
org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
at 
org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
at 

[jira] [Commented] (FLINK-31984) Savepoint on S3 should be relocatable if entropy injection is not effective

2023-05-04 Thread Mingliang Liu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719402#comment-17719402
 ] 

Mingliang Liu commented on FLINK-31984:
---

Thanks for catching this [~mapohl]. Sorry it is my fault to break public API 
without a closer look into EntropyInjector. I completely agree with [~xtsong] 
on preserving (and deprecating?) the old public method while introducing the 
new one. I will provide a patch shortly for 1.16 and 1.17.

> Savepoint on S3 should be relocatable if entropy injection is not effective
> ---
>
> Key: FLINK-31984
> URL: https://issues.apache.org/jira/browse/FLINK-31984
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems, Runtime / Checkpointing
>Affects Versions: 1.16.1
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
>
> We have a limitation that if we create savepoints with an injected entropy, 
> they are not relocatable 
> (https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#triggering-savepoints).
> FLINK-25952 improves the check by inspecting both the FileSystem extending 
> {{EntropyInjectingFileSystem}} and 
> {{FlinkS3FileSystem#getEntropyInjectionKey}} not returning null. We can 
> improve this further by checking the checkpoint path is indeed using the 
> entropy injection key. Without that, the savepoint is not relocatable even if 
> the {{state.savepoints.dir}} does not contain the entropy.
> In our setting, we enable entropy injection by setting {{s3.entropy.key}} to 
>  {{\__ENTROPY_KEY\__}} and use the entropy key in the checkpoint path (for 
> e.g. {{s3://mybuket/checkpoints/__ENTROPY_KEY__/myapp}}). However, in the 
> savepoint path, we don't use the entropy key (for e.g. 
> {{s3://mybuket/savepoints/myapp}}) because we want the savepoint to be 
> relocatable. But the current logic still generates non-relocatable savepoint 
> path just because the entropy injection key is non-null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] XComp commented on pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into DefaultLea

2023-05-04 Thread via GitHub


XComp commented on PR #22422:
URL: https://github.com/apache/flink/pull/22422#issuecomment-1535006196

   Did a final rebase after PR #22517 was merged to `master` to run the test 
suite with the proper `DirectExecutorService` implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-31995) DirectExecutorService doesn't follow the ExecutorService contract throwing a RejectedExecutionException in case it's already shut down

2023-05-04 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl resolved FLINK-31995.
---
Fix Version/s: 1.18.0
 Assignee: Matthias Pohl
   Resolution: Fixed

master: 8d430f51d77cf4f0d3291da6a7333f1aa9a87d22

> DirectExecutorService doesn't follow the ExecutorService contract throwing a 
> RejectedExecutionException in case it's already shut down
> --
>
> Key: FLINK-31995
> URL: https://issues.apache.org/jira/browse/FLINK-31995
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.16.1, 1.18.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> We experienced an issue where we tested behavior using the 
> {{DirectExecutorService}} with the {{ExecutorService}} being shutdown 
> already. The tests succeeded. The production code failed, though, because we 
> used a singleThreadExecutor for which the task execution failed after the 
> executor was stopped. We should add this behavior to the 
> {{DirectExecutorService}} as well to make the unit tests be closer to the 
> production code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] XComp merged pull request #22517: [FLINK-31995][tests] Adds shutdown check to DirectExecutorService

2023-05-04 Thread via GitHub


XComp merged PR #22517:
URL: https://github.com/apache/flink/pull/22517


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] zoltar9264 commented on pull request #22458: [FLINK-31743][statebackend/rocksdb] disable rocksdb log relocating wh…

2023-05-04 Thread via GitHub


zoltar9264 commented on PR #22458:
URL: https://github.com/apache/flink/pull/22458#issuecomment-1534969569

   Thanks for your reminder @fredia , but I think this pr is just a hotfix. I 
will submit a PR to the frocksdb repository to finally fix this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-web] echauchot commented on pull request #643: Add blog article: Howto test a batch source with the new Source framework

2023-05-04 Thread via GitHub


echauchot commented on PR #643:
URL: https://github.com/apache/flink-web/pull/643#issuecomment-1534918324

   @MartijnVisser @zentol did almost all my last reviews. Would you have time 
to review this small article as you know the source framework ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-web] echauchot commented on pull request #631: Add blog article: Howto migrate a real-life batch pipeline from the DataSet API to the DataStream API

2023-05-04 Thread via GitHub


echauchot commented on PR #631:
URL: https://github.com/apache/flink-web/pull/631#issuecomment-1534879847

   @zentol how does this article look like after last modifications ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31974) JobManager crashes after KubernetesClientException exception with FatalExitExceptionHandler

2023-05-04 Thread Thomas Weise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719355#comment-17719355
 ] 

Thomas Weise commented on FLINK-31974:
--

There are many cases where errors are transient. This specific case is actually 
quite obvious, the resource availability on a large cluster is changing 
constantly. A pod may not be scheduled now but few seconds later. Other k8s 
related issues can also be transient, for example a failed request due to rate 
limiting will likely succeed soon after and we would actually make things worse 
by not following a backoff/retry strategy and simply letting the job fail. I'm 
also leaning more towards retry by default strategy and identify the cases that 
should be fatal error.

> JobManager crashes after KubernetesClientException exception with 
> FatalExitExceptionHandler
> ---
>
> Key: FLINK-31974
> URL: https://issues.apache.org/jira/browse/FLINK-31974
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.17.0
>Reporter: Sergio Sainz
>Assignee: Weijie Guo
>Priority: Major
>
> When resource quota limit is reached JobManager will throw
>  
> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: POST at: 
> https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-2" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>  
> In {*}1.16.1 , this is handled gracefully{*}:
> {code}
> 2023-04-28 22:07:24,631 WARN  
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
> Failed requesting worker with resource spec WorkerResourceSpec 
> \{cpuCores=1.0, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb 
> (241591914 bytes), numSlots=4}, current pending count: 0
> java.util.concurrent.CompletionException: 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-138" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>         at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> ~[?:?]
>         at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. 
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods 
> "my-namespace-flink-cluster-taskmanager-1-138" is forbidden: exceeded quota: 
> my-namespace-resource-quota, requested: limits.cpu=3, used: 
> limits.cpu=12100m, limited: limits.cpu=13.
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:684)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:664)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:613)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:308)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:644)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:83)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         

[GitHub] [flink] dmvk commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


dmvk commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185085549


##
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java:
##
@@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo(
  */
 @Override
 public void fail(Throwable t) {
-processFail(t, true);
+processFail(t, true, Collections.emptyMap());

Review Comment:
   Would this be enriched once the 
https://github.com/apache/flink/pull/22506#discussion_r1185081996 is addressed, 
since `fromSchedulerNg` will be set to `false`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] LadyForest commented on a diff in pull request #22515: [FLINK-31952][table] Support 'EXPLAIN' statement for CompiledPlan.

2023-05-04 Thread via GitHub


LadyForest commented on code in PR #22515:
URL: https://github.com/apache/flink/pull/22515#discussion_r1184990962


##
flink-table/flink-sql-parser/src/test/java/org/apache/flink/sql/parser/FlinkSqlParserImplTest.java:
##
@@ -2072,6 +2072,13 @@ void testExplainPlanFor() {
 this.sql(sql).ok(expected);
 }
 
+@Test
+void testExplainForJsonFile() {
+String sql = "explain plan for 'file://path'";
+String expected = "EXPLAIN 'file://path'";

Review Comment:
   Please add more cases to cover `ExplainDetail`s, e.g.
   ```java
   this.sql("explain changelog_mode '/mydir/plan.json'").ok(EXPLAIN 
CHANGELOG_MODE '/mydir/plan.json');
   ```



##
flink-table/flink-sql-parser/src/test/java/org/apache/flink/sql/parser/FlinkSqlParserImplTest.java:
##
@@ -2072,6 +2072,13 @@ void testExplainPlanFor() {
 this.sql(sql).ok(expected);
 }
 
+@Test
+void testExplainForJsonFile() {

Review Comment:
   nit: `testExplainCompiledPlan`



##
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/internal/TableEnvironmentImpl.java:
##
@@ -915,6 +916,16 @@ public TableResultInternal executeInternal(Operation 
operation) {
 return executeInternal(((StatementSetOperation) 
operation).getOperations());
 } else if (operation instanceof ExplainOperation) {
 ExplainOperation explainOperation = (ExplainOperation) operation;
+if (explainOperation instanceof ExplainFileOperation) {
+ExplainFileOperation explainfileOperation = 
(ExplainFileOperation) explainOperation;
+CompiledPlan cp =
+
loadPlan(PlanReference.fromFile(explainfileOperation.getFilePath()));

Review Comment:
   Why just load the plan and return it?  I think we should call 
`compiledPlan#explain(...)`



##
flink-table/flink-sql-parser/src/main/java/org/apache/flink/sql/parser/dql/SqlRichExplain.java:
##
@@ -58,6 +60,10 @@ public SqlNode getStatement() {
 return statement;
 }
 
+public String getPlanFile() {

Review Comment:
   It might be weird to add this method here. `SqlRichExplain` is a generalized 
representation of the syntax, while `getPlanFile` is only suitable for `EXPLAIN 
'plan.json'`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] dmvk commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling

2023-05-04 Thread via GitHub


dmvk commented on code in PR #22506:
URL: https://github.com/apache/flink/pull/22506#discussion_r1185081996


##
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java:
##
@@ -473,26 +500,50 @@ public CompletableFuture cancel(Time 
timeout) {
 @Override
 public CompletableFuture updateTaskExecutionState(
 final TaskExecutionState taskExecutionState) {
-FlinkException taskExecutionException;
+checkNotNull(taskExecutionState, "taskExecutionState");
+// Use the main/caller thread for all updates to make sure they are 
processed in order.
+// (MainThreadExecutor i.e., the akka thread pool does not guarantee 
that)
+// Only detach for a FAILED state update that is terminal and may 
perform io heavy labeling.
+if 
(ExecutionState.FAILED.equals(taskExecutionState.getExecutionState())) {
+return labelFailure(taskExecutionState)
+.thenApplyAsync(
+taskStateWithLabels -> {
+try {
+return 
doUpdateTaskExecutionState(taskStateWithLabels);
+} catch (FlinkException e) {
+throw new CompletionException(e);
+}
+},
+getMainThreadExecutor());
+}
 try {
-checkNotNull(taskExecutionState, "taskExecutionState");
+return CompletableFuture.completedFuture(
+doUpdateTaskExecutionState(taskExecutionState));
+} catch (FlinkException e) {
+return FutureUtils.completedExceptionally(e);
+}
+}
 
+private Acknowledge doUpdateTaskExecutionState(final TaskExecutionState 
taskExecutionState)
+throws FlinkException {
+@Nullable FlinkException taskExecutionException;
+try {
 if (schedulerNG.updateTaskExecutionState(taskExecutionState)) {

Review Comment:
   That's a good catch; the most straightforward way to address this would be 
to change the listener to go through JobMaster#updateTaskExecutionState instead 
so we have a shared (semi-asynchronous) code path for all execution state 
transitions.
   
   `ExecutionDeploymentReconciliationHandler#onMissingDeploymentsOf` seems to 
have the same problem (and the fix could most likely be along the same lines).
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-31970) "Key group 0 is not in KeyGroupRange" when using CheckpointedFunction

2023-05-04 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719336#comment-17719336
 ] 

Piotr Nowojski edited comment on FLINK-31970 at 5/4/23 1:53 PM:


[~YordanPavlov] I was not able to reproduce your error, however you are 
incorrectly working with the state in your {{TestWindow}} function. Please take 
a look at the 
[documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/#using-keyed-state].
 Especially the following paragraph
{quote}
It is important to keep in mind that these state objects are only used for 
interfacing with state. The state is not necessarily stored inside but might 
reside on disk or somewhere else. The second thing to keep in mind is that the 
value you get from the state depends on the key of the input element. So the 
value you get in one invocation of your user function can differ from the value 
in another invocation if the keys involved are different.
{quote}
First problem is that your {{TestWindow#snapshoState}} method is trying to 
access {{state}} field, in a keyed operator/function, inside a method that 
doesn't have key context. That's the error I was getting when I tried to run 
your code. 
Secondly, in your {{TestWindow#apply}} method is invoked within some key 
context, but you are storing the actual {{count}} as a regular java field. So 
if you ever had more then one key (that's not the case in your example, as you 
are using {{.keyBy(_ => 1)}} key selector), all of the keys processed by a 
single parallel instance of the window operator, would be collected on the same 
{{count}} field, which doesn't make any sense.

The corrected window function should look like this:

{code:scala}
class CorrectTestWindow() extends WindowFunction[(Long, Int), Long, Int, 
TimeWindow] with CheckpointedFunction {
  var state: ValueState[Long] = _

  override def snapshotState(functionSnapshotContext: FunctionSnapshotContext): 
Unit = {
  }

  override def initializeState(context: FunctionInitializationContext): Unit = {
val storeDescriptor = new 
ValueStateDescriptor[Long]("state-xrp-dex-pricer", createTypeInformation[Long])
state = context.getKeyedStateStore.getState(storeDescriptor)
  }

  override def apply(key: Int, window: TimeWindow, input: lang.Iterable[(Long, 
Int)], out: Collector[Long]): Unit = {
val count = state.value() + input.size
state.update(count)
out.collect(count)
  }
}
{code}
However even that is kind of strange. You are creating a tumbling window every 
1ms, only to aggregate the {{count}} across all past tumbling windows for the 
given key? If you want to aggregate count only across the given tumbling 
window, your apply could look like as simple as:
{code:scala}
  override def apply(key: Int, window: TimeWindow, input: lang.Iterable[(Long, 
Int)], out: Collector[Long]): Unit = {
out.collect(input.size)
  }
{code}
And you don't need any additional state for that. On the other hand, if you 
want to aggregate results globally, you can use
{code:scala}
.keyBy(_ => 1)
.window(GlobalWindows.create())
.tigger(/* maybe a custom trigger here */)
.apply(new TestWindow())
{code}


was (Author: pnowojski):
[~YordanPavlov] I was not able to reproduce your error, however you are 
incorrectly working with the state in your {{TestWindow}} function. Please take 
a look at the 
[documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/#using-keyed-state].
 Especially the following paragraph
{quote}
It is important to keep in mind that these state objects are only used for 
interfacing with state. The state is not necessarily stored inside but might 
reside on disk or somewhere else. The second thing to keep in mind is that the 
value you get from the state depends on the key of the input element. So the 
value you get in one invocation of your user function can differ from the value 
in another invocation if the keys involved are different.
{quote}
First problem is that your {{TestWindow#snapshoState}} method is trying to 
access {{state}} field, in a keyed operator/function, inside a method that 
doesn't have key context. That's the error I was getting when I tried to run 
your code. 
Secondly, in your {{TestWindow#apply}} method is invoked within some key 
context, but you are storing the actual {{count}} as a regular java field. So 
if you ever had more then one key (that's not the case in your example, as you 
are using {{.keyBy(_ => 1)}} key selector), all of the keys processed by a 
single parallel instance of the window operator, would be collected on the same 
{{count}} field, which doesn't make any sense.

The corrected window function should look like this:

{code:scala}
class CorrectTestWindow() extends WindowFunction[(Long, Int), Long, Int, 
TimeWindow] with 

[jira] [Commented] (FLINK-31970) "Key group 0 is not in KeyGroupRange" when using CheckpointedFunction

2023-05-04 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719336#comment-17719336
 ] 

Piotr Nowojski commented on FLINK-31970:


[~YordanPavlov] I was not able to reproduce your error, however you are 
incorrectly working with the state in your {{TestWindow}} function. Please take 
a look at the 
[documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/#using-keyed-state].
 Especially the following paragraph
{quote}
It is important to keep in mind that these state objects are only used for 
interfacing with state. The state is not necessarily stored inside but might 
reside on disk or somewhere else. The second thing to keep in mind is that the 
value you get from the state depends on the key of the input element. So the 
value you get in one invocation of your user function can differ from the value 
in another invocation if the keys involved are different.
{quote}
First problem is that your {{TestWindow#snapshoState}} method is trying to 
access {{state}} field, in a keyed operator/function, inside a method that 
doesn't have key context. That's the error I was getting when I tried to run 
your code. 
Secondly, in your {{TestWindow#apply}} method is invoked within some key 
context, but you are storing the actual {{count}} as a regular java field. So 
if you ever had more then one key (that's not the case in your example, as you 
are using {{.keyBy(_ => 1)}} key selector), all of the keys processed by a 
single parallel instance of the window operator, would be collected on the same 
{{count}} field, which doesn't make any sense.

The corrected window function should look like this:

{code:scala}
class CorrectTestWindow() extends WindowFunction[(Long, Int), Long, Int, 
TimeWindow] with CheckpointedFunction {
  var state: ValueState[Long] = _

  override def snapshotState(functionSnapshotContext: FunctionSnapshotContext): 
Unit = {
  }

  override def initializeState(context: FunctionInitializationContext): Unit = {
val storeDescriptor = new 
ValueStateDescriptor[Long]("state-xrp-dex-pricer", createTypeInformation[Long])
state = context.getKeyedStateStore.getState(storeDescriptor)
  }

  override def apply(key: Int, window: TimeWindow, input: lang.Iterable[(Long, 
Int)], out: Collector[Long]): Unit = {
val count = state.value() + input.size
state.update(count)
out.collect(count)
  }
}
{code}
However even that is kind of strange. You are creating a tumbling window every 
1ms, only to aggregate the {{count}} across all past tumbling windows for the 
given key? If you want to aggregate count only across the given tumbling 
window, your apply could look like as simple as:
{code:scala}
  override def apply(key: Int, window: TimeWindow, input: lang.Iterable[(Long, 
Int)], out: Collector[Long]): Unit = {
out.collect(input.size)
  }
{code}
And you don't need any additional state for that. On the other hand, if you 
want to aggregate results globally, you can use
{code:scala}
.keyBy(_ => 1)
.window(GlobalWindows.create())
.tigger(/* maybe a custom trigger here */)
.apply(new TestWindow())
{code}

> "Key group 0 is not in KeyGroupRange" when using CheckpointedFunction
> -
>
> Key: FLINK-31970
> URL: https://issues.apache.org/jira/browse/FLINK-31970
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.17.0
>Reporter: Yordan Pavlov
>Priority: Major
> Attachments: fill-topic.sh, main.scala
>
>
> I am experiencing a problem where the following exception would be thrown on 
> Flink stop (stop with savepoint):
>  
> {code:java}
> org.apache.flink.util.SerializedThrowable: 
> java.lang.IllegalArgumentException: Key group 0 is not in 
> KeyGroupRange{startKeyGroup=86, endKeyGroup=127}.{code}
>  
> I do not have a non deterministic keyBy() operator in fact, I use 
> {code:java}
> .keyBy(_ => 1){code}
> I believe the problem is related to using RocksDB state along with a 
> {code:java}
> CheckpointedFunction{code}
> In my test program I have commented out a reduction of the parallelism which 
> would make the problem go away. I am attaching a standalone program which 
> presents the problem and also a script which generates the input data. For 
> clarity I would paste here the essence of the job:
>  
>  
> {code:scala}
> env.fromSource(kafkaSource, watermarkStrategy, "KafkaSource")
> .setParallelism(3)
> .keyBy(_ => 1)
> .window(TumblingEventTimeWindows.of(Time.of(1, TimeUnit.MILLISECONDS)))
> .apply(new TestWindow())
> /* .setParallelism(1) this would prevent the problem */
> .uid("window tester")
> .name("window tester")
> .print()
> class TestWindow() extends WindowFunction[(Long, Int), Long, Int, 

[jira] [Assigned] (FLINK-31867) Enforce a minimum number of observations within a metric window

2023-05-04 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned FLINK-31867:
--

Assignee: Maximilian Michels

> Enforce a minimum number of observations within a metric window
> ---
>
> Key: FLINK-31867
> URL: https://issues.apache.org/jira/browse/FLINK-31867
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler, Kubernetes Operator
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: kubernetes-operator-1.5.0
>
>
> The metric window is currently only time-based. We should make sure we see a 
> minimum number of observations to ensure we don't decide based on too few 
> observations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32002) Revisit autoscaler defaults for next release

2023-05-04 Thread Maximilian Michels (Jira)
Maximilian Michels created FLINK-32002:
--

 Summary: Revisit autoscaler defaults for next release
 Key: FLINK-32002
 URL: https://issues.apache.org/jira/browse/FLINK-32002
 Project: Flink
  Issue Type: Improvement
  Components: Autoscaler, Kubernetes Operator
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: kubernetes-operator-1.5.0


We haven't put much thought into the defaults. We should revisit and adjust the 
defaults to fit most use cases without being overly aggressive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31867) Enforce a minimum number of observations within a metric window

2023-05-04 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-31867:
---
Fix Version/s: kubernetes-operator-1.5.0

> Enforce a minimum number of observations within a metric window
> ---
>
> Key: FLINK-31867
> URL: https://issues.apache.org/jira/browse/FLINK-31867
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler, Kubernetes Operator
>Reporter: Maximilian Michels
>Priority: Major
> Fix For: kubernetes-operator-1.5.0
>
>
> The metric window is currently only time-based. We should make sure we see a 
> minimum number of observations to ensure we don't decide based on too few 
> observations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31974) JobManager crashes after KubernetesClientException exception with FatalExitExceptionHandler

2023-05-04 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719331#comment-17719331
 ] 

Xintong Song commented on FLINK-31974:
--

[~gyfora],

bq. Flink treats only very few errors fatal. IO errors, connector (source/sink 
) errors etc all cause job restarts and in many cases "Flink cannot recover 
from by itself". You actually expect the error to be temporary and hopefully 
not get it after the restart. So I think it would be generally inconsistent 
with the current error handling behaviour if resource manager errors would 
simply let the job die fatally and not retry in the same way.

I think the difference here is that, for IO errors and connector errors, it 
affects the job but not the Flink cluster / deployment. Thinking of a session 
cluster, we should not fail the cluster for an error from a single job. But for 
resource manager interacting with Kubernetes API server, this is a cluster 
behavior and conceptually we don't distinguish resources for individual jobs 
until the slots are allocated. Moreover, it's possible that multiple jobs share 
the same resource (pod). One could argue that in application mode the cluster / 
deployment is equivalent to the job. However, the cluster mode (session / 
application) is transparent to the resource manager.

 bq. Flink jobs/clusters should be resilient and keep retrying in case of 
errors and should not give up especially for streaming workloads.

This is different from the feedback that I get from our production. But I can 
understand if that's what some of the users want. So I guess maybe it worth a 
configuration option as you suggested.

[~mbalassi],

+1 to what you said about the specific case. I think there's a consensus on 
reaching quota limit should not be treated as fatal errors.

> JobManager crashes after KubernetesClientException exception with 
> FatalExitExceptionHandler
> ---
>
> Key: FLINK-31974
> URL: https://issues.apache.org/jira/browse/FLINK-31974
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.17.0
>Reporter: Sergio Sainz
>Assignee: Weijie Guo
>Priority: Major
>
> When resource quota limit is reached JobManager will throw
>  
> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: POST at: 
> https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-2" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>  
> In {*}1.16.1 , this is handled gracefully{*}:
> {code}
> 2023-04-28 22:07:24,631 WARN  
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
> Failed requesting worker with resource spec WorkerResourceSpec 
> \{cpuCores=1.0, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb 
> (241591914 bytes), numSlots=4}, current pending count: 0
> java.util.concurrent.CompletionException: 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-138" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>         at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> ~[?:?]
>         at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. 
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods 
> "my-namespace-flink-cluster-taskmanager-1-138" is forbidden: exceeded quota: 
> my-namespace-resource-quota, requested: limits.cpu=3, used: 
> limits.cpu=12100m, limited: limits.cpu=13.
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:684)
>  

[jira] [Closed] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working

2023-05-04 Thread Sriram Ganesh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriram Ganesh closed FLINK-31135.
-
Resolution: Not A Bug

It was an intermittent issue. 

> ConfigMap DataSize went > 1 MB and cluster stopped working
> --
>
> Key: FLINK-31135
> URL: https://issues.apache.org/jira/browse/FLINK-31135
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.2.0
>Reporter: Sriram Ganesh
>Priority: Major
> Attachments: dump_cm.yaml, 
> flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip,
>  image-2023-04-19-09-48-19-089.png, image-2023-05-03-13-47-51-440.png, 
> image-2023-05-03-13-50-54-783.png, image-2023-05-03-13-51-21-685.png, 
> jobmanager_log.txt, 
> parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml
>
>
> I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs 
> failed with the below error. It seems the config map size went beyond 1 MB 
> (default size). 
> Since it is managed by the operator and config maps are not updated with any 
> manual intervention, I suspect it could be an operator issue. 
>  
> {code:java}
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: PUT at: 
> https:///api/v1/namespaces//configmaps/-config-map. Message: 
> ConfigMap "-config-map" is invalid: []: Too long: must have at most 
> 1048576 bytes. Received status: Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must 
> have at most 1048576 bytes, reason=FieldValueTooLong, 
> additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, 
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
> message=ConfigMap "-config-map" is invalid: []: Too long: must have at 
> most 1048576 bytes, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>  ~[?:?]
> ... 3 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working

2023-05-04 Thread Sriram Ganesh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719329#comment-17719329
 ] 

Sriram Ganesh commented on FLINK-31135:
---

In my case, it is not a permission issue. I couldn't repro the issue. My hunch 
is there could be a network issue during that time. 

Thanks, [~Swathi Chandrashekar] [~zhihaochen]. I am closing it.

> ConfigMap DataSize went > 1 MB and cluster stopped working
> --
>
> Key: FLINK-31135
> URL: https://issues.apache.org/jira/browse/FLINK-31135
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.2.0
>Reporter: Sriram Ganesh
>Priority: Major
> Attachments: dump_cm.yaml, 
> flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip,
>  image-2023-04-19-09-48-19-089.png, image-2023-05-03-13-47-51-440.png, 
> image-2023-05-03-13-50-54-783.png, image-2023-05-03-13-51-21-685.png, 
> jobmanager_log.txt, 
> parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml
>
>
> I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs 
> failed with the below error. It seems the config map size went beyond 1 MB 
> (default size). 
> Since it is managed by the operator and config maps are not updated with any 
> manual intervention, I suspect it could be an operator issue. 
>  
> {code:java}
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: PUT at: 
> https:///api/v1/namespaces//configmaps/-config-map. Message: 
> ConfigMap "-config-map" is invalid: []: Too long: must have at most 
> 1048576 bytes. Received status: Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must 
> have at most 1048576 bytes, reason=FieldValueTooLong, 
> additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, 
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
> message=ConfigMap "-config-map" is invalid: []: Too long: must have at 
> most 1048576 bytes, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325)
>  ~[flink-dist-1.15.2.jar:1.15.2]
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>  ~[?:?]
> ... 3 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] reswqa commented on pull request #22447: [FLINK-31764][runtime] Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool

2023-05-04 Thread via GitHub


reswqa commented on PR #22447:
URL: https://github.com/apache/flink/pull/22447#issuecomment-1534797190

   Squashed the fix-up commit, let's waiting for the CI green.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] reswqa commented on a diff in pull request #22447: [FLINK-31764][runtime] Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool

2023-05-04 Thread via GitHub


reswqa commented on code in PR #22447:
URL: https://github.com/apache/flink/pull/22447#discussion_r1185020735


##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/buffer/LocalBufferPool.java:
##
@@ -460,17 +454,25 @@ private boolean requestMemorySegmentFromGlobal() {
 private MemorySegment requestOverdraftMemorySegmentFromGlobal() {
 assert Thread.holdsLock(availableMemorySegments);
 
-if (numberOfRequestedOverdraftMemorySegments >= 
maxOverdraftBuffersPerGate) {
+// if overdraft buffers(i.e. buffers exceeding poolSize) is greater 
than or equal to
+// maxOverdraftBuffersPerGate, no new buffer can be requested.
+if (numberOfRequestedMemorySegments - currentPoolSize >= 
maxOverdraftBuffersPerGate) {
 return null;
 }
 
 checkState(
 !isDestroyed,
 "Destroyed buffer pools should never acquire segments - this 
will lead to buffer leaks.");

Review Comment:
   Ah, good catch! I somehow missed this  PR updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] akalash commented on a diff in pull request #22447: [FLINK-31764][runtime] Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool

2023-05-04 Thread via GitHub


akalash commented on code in PR #22447:
URL: https://github.com/apache/flink/pull/22447#discussion_r118500


##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/buffer/LocalBufferPool.java:
##
@@ -460,17 +454,25 @@ private boolean requestMemorySegmentFromGlobal() {
 private MemorySegment requestOverdraftMemorySegmentFromGlobal() {
 assert Thread.holdsLock(availableMemorySegments);
 
-if (numberOfRequestedOverdraftMemorySegments >= 
maxOverdraftBuffersPerGate) {
+// if overdraft buffers(i.e. buffers exceeding poolSize) is greater 
than or equal to
+// maxOverdraftBuffersPerGate, no new buffer can be requested.
+if (numberOfRequestedMemorySegments - currentPoolSize >= 
maxOverdraftBuffersPerGate) {
 return null;
 }
 
 checkState(
 !isDestroyed,
 "Destroyed buffer pools should never acquire segments - this 
will lead to buffer leaks.");

Review Comment:
   I think you can move `checkState` to your newly created method 
`requestPooledMemorySegment` as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-32001) SupportsRowLevelUpdate does not support returning only a part of the columns.

2023-05-04 Thread Ming Li (Jira)
Ming Li created FLINK-32001:
---

 Summary: SupportsRowLevelUpdate does not support returning only a 
part of the columns.
 Key: FLINK-32001
 URL: https://issues.apache.org/jira/browse/FLINK-32001
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Runtime
Affects Versions: 1.17.0
Reporter: Ming Li


[FLIP-282|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235838061]
 introduces the new Delete and Update API in Flink SQL. Although it is 
described in the documentation that in case of {{partial-update}} we only need 
to return the primary key columns and the updated columns.

But in fact, the topology of the job  is {{{}source -> cal -> 
constraintEnforcer -> sink{}}}, and the constraint check will be performed in 
the operator of {{{}constraintEnforcer{}}}, which is done according to index, 
not according to column. If only some columns are returned, the constraint 
check is wrong, and it is easy to generate 
{{{}ArrayIndexOutOfBoundsException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] reswqa commented on pull request #22447: [FLINK-31764][runtime] Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool

2023-05-04 Thread via GitHub


reswqa commented on PR #22447:
URL: https://github.com/apache/flink/pull/22447#issuecomment-1534754347

   Thanks @akalash for carefully confirming this change! I have been extracted 
the logic related to request buffer from global pool and increase the 
`numberOfRequestedMemorySegments` into a common method in the fix-up commit. 
PTAL~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31560) Savepoint failing to complete with ExternallyInducedSources

2023-05-04 Thread Brian Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719314#comment-17719314
 ] 

Brian Zhou commented on FLINK-31560:


[~Yanfei Lei]  I think it's fair for the community to not fix a 
plan-to-deprecate API for now, this issue is only from the legacy source, and 
from Pravega side discussed with the team, as we are also transitioning into 
FLIP-27 source, we will also have a way out.

> Savepoint failing to complete with ExternallyInducedSources
> ---
>
> Key: FLINK-31560
> URL: https://issues.apache.org/jira/browse/FLINK-31560
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.16.0
>Reporter: Fan Yang
>Assignee: Yanfei Lei
>Priority: Major
> Attachments: image-2023-03-23-18-03-05-943.png, 
> image-2023-03-23-18-19-24-482.png, jobmanager_log.txt, 
> taskmanager_172.28.17.19_6123-f2dbff_log, 
> tmp_tm_172.28.17.19_6123-f2dbff_tmp_job_83ad4f408d0e7bf30f940ddfa5fe00e3_op_WindowOperator_137df028a798f504a6900a4081c9990c__1_1__uuid_edc681f0-3825-45ce-a123-9ff69ce6d8f1_db_LOG
>
>
> Flink version: 1.16.0
>  
> We are using Flink to run some streaming applications with Pravega as source 
> and use window and reduce transformations. We use RocksDB state backend with 
> incremental checkpointing enabled. We don't enable the latest changelog state 
> backend.
> When we try to stop the job, we encounter issues with the savepoint failing 
> to complete for the job. This happens most of the time. On rare occasions, 
> the job gets canceled suddenly with its savepoint get completed successfully.
> Savepointing shows below error:
>  
> {code:java}
> 2023-03-22 08:55:57,521 [jobmanager-io-thread-1] WARN  
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
> trigger or complete checkpoint 189 for job 7354442cd6f7c121249360680c04284d. 
> (0 consecutive failed attempts so 
> far)org.apache.flink.runtime.checkpoint.CheckpointException: Failure to 
> finalize checkpoint.    at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1375)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1265)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1157)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]    at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: java.io.IOException: Unknown implementation of StreamStateHandle: 
> class org.apache.flink.runtime.state.PlaceholderStreamStateHandle    at 
> org.apache.flink.runtime.checkpoint.metadata.MetadataV2V3SerializerBase.serializeStreamStateHandle(MetadataV2V3SerializerBase.java:699)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.metadata.MetadataV2V3SerializerBase.serializeStreamStateHandleMap(MetadataV2V3SerializerBase.java:813)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.metadata.MetadataV2V3SerializerBase.serializeKeyedStateHandle(MetadataV2V3SerializerBase.java:344)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.metadata.MetadataV2V3SerializerBase.serializeKeyedStateCol(MetadataV2V3SerializerBase.java:269)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.metadata.MetadataV2V3SerializerBase.serializeSubtaskState(MetadataV2V3SerializerBase.java:262)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.metadata.MetadataV3Serializer.serializeSubtaskState(MetadataV3Serializer.java:142)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.metadata.MetadataV3Serializer.serializeOperatorState(MetadataV3Serializer.java:122)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.metadata.MetadataV2V3SerializerBase.serializeMetadata(MetadataV2V3SerializerBase.java:146)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> org.apache.flink.runtime.checkpoint.metadata.MetadataV3Serializer.serialize(MetadataV3Serializer.java:83)
>  ~[flink-dist-1.16.0.jar:1.16.0]    at 
> 

[GitHub] [flink-connector-jdbc] reswqa commented on a diff in pull request #45: [hotfix] Fix incorrect sql_url in jdbc.yml.

2023-05-04 Thread via GitHub


reswqa commented on code in PR #45:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/45#discussion_r1184966872


##
docs/data/jdbc.yml:
##
@@ -18,4 +18,4 @@
 
 variants:
   - maven: flink-connector-jdbc
-sql_url: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-jdbc/$full_version/flink-sql-connector-jdbc-$full_version.jar
+sql_url: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-connector-jdbc/$full_version/flink-connector-jdbc-$full_version.jar

Review Comment:
   IIRC, `v3.1.0` should not have been released yet. So, I guess you mean that 
the jdbc connector supports `flink-1.17` will be the version `v3.1.0`.
   
   At present, the `release-1.17` branch of flink tracks the `v3.0` branch of 
the `flink-connnector-jdbc` repository. 
   
https://github.com/apache/flink-connector-jdbc/blob/b07bddfc8d741e203d3d9b6eeefe5113ecfb4acc/docs/content/docs/connectors/table/jdbc.md?plain=1#L41
   This line in `v3.0` branch indicates that the connector version is `v3.0.0`, 
which will cause `$full_version` to be replaced with `3.0.0-1,17`. If we want 
to release `v3.1.0`, modifying the version number here should be a necessary 
part of the release process.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-jdbc] reswqa commented on a diff in pull request #45: [hotfix] Fix incorrect sql_url in jdbc.yml.

2023-05-04 Thread via GitHub


reswqa commented on code in PR #45:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/45#discussion_r1184966872


##
docs/data/jdbc.yml:
##
@@ -18,4 +18,4 @@
 
 variants:
   - maven: flink-connector-jdbc
-sql_url: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-jdbc/$full_version/flink-sql-connector-jdbc-$full_version.jar
+sql_url: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-connector-jdbc/$full_version/flink-connector-jdbc-$full_version.jar

Review Comment:
   IIRC, `v3.1.0` should not have been released yet. So, I guess you mean that 
the jdbc connector supports `flink-1.17` will be the version `v3.1.0`.
   
   At present, the `release-1.17` branch of flink tracks the `v3.0` branch of 
the `flink-connnector-jdbc` repository. 
   
https://github.com/apache/flink-connector-jdbc/blob/b07bddfc8d741e203d3d9b6eeefe5113ecfb4acc/docs/content/docs/connectors/table/jdbc.md?plain=1#L41
   This line indicates that the connector version is `v3.0.0`, which will cause 
`$full_version` to be replaced with `3.0.0-1,17`. If we want to release 
`v3.1.0`, modifying the version number here should be a necessary part of the 
release process.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31984) Savepoint on S3 should be relocatable if entropy injection is not effective

2023-05-04 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719309#comment-17719309
 ] 

Xintong Song commented on FLINK-31984:
--

Thanks for fixing it, [~mapohl]. And sorry for causing this trouble. I'm a bit 
surprised that {{EntropyInjector}} is an API class.

Now I see two options to moving forward: 1) we merge this only for 1.18, or 2) 
we try to fix this for 1.16 & 1.17 in a compatible way (preserving the old and 
likely unused public method). Given that this is an improvement, I'm leaning 
towards 1).

[~liuml07] WDTY?

> Savepoint on S3 should be relocatable if entropy injection is not effective
> ---
>
> Key: FLINK-31984
> URL: https://issues.apache.org/jira/browse/FLINK-31984
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems, Runtime / Checkpointing
>Affects Versions: 1.16.1
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
>
> We have a limitation that if we create savepoints with an injected entropy, 
> they are not relocatable 
> (https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#triggering-savepoints).
> FLINK-25952 improves the check by inspecting both the FileSystem extending 
> {{EntropyInjectingFileSystem}} and 
> {{FlinkS3FileSystem#getEntropyInjectionKey}} not returning null. We can 
> improve this further by checking the checkpoint path is indeed using the 
> entropy injection key. Without that, the savepoint is not relocatable even if 
> the {{state.savepoints.dir}} does not contain the entropy.
> In our setting, we enable entropy injection by setting {{s3.entropy.key}} to 
>  {{\__ENTROPY_KEY\__}} and use the entropy key in the checkpoint path (for 
> e.g. {{s3://mybuket/checkpoints/__ENTROPY_KEY__/myapp}}). However, in the 
> savepoint path, we don't use the entropy key (for e.g. 
> {{s3://mybuket/savepoints/myapp}}) because we want the savepoint to be 
> relocatable. But the current logic still generates non-relocatable savepoint 
> path just because the entropy injection key is non-null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] fredia commented on pull request #22458: [FLINK-31743][statebackend/rocksdb] disable rocksdb log relocating wh…

2023-05-04 Thread via GitHub


fredia commented on PR #22458:
URL: https://github.com/apache/flink/pull/22458#issuecomment-1534698282

   @zoltar9264 Thanks for the PR, should the description of 
[`state.backend.rocksdb.log.dir`](https://github.com/apache/flink/blob/master/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBConfigurableOptions.java#L90)
 be changed accordingly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #22521: [FLINK-32000][webui] Expose max parallelism in the vertex detail drawer.

2023-05-04 Thread via GitHub


flinkbot commented on PR #22521:
URL: https://github.com/apache/flink/pull/22521#issuecomment-1534678585

   
   ## CI report:
   
   * 019520d0ff3f7c88559fe35ecde57aa01bfeada9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-32000) Expose vertex max parallelism in the WebUI

2023-05-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-32000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek reassigned FLINK-32000:
-

Assignee: David Morávek

> Expose vertex max parallelism in the WebUI
> --
>
> Key: FLINK-32000
> URL: https://issues.apache.org/jira/browse/FLINK-32000
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Screenshot 2023-05-04 at 14.15.34.png
>
>
> It would be great to expose max parallelism in the vertex detail drawer for 
> debug purposes  !Screenshot 2023-05-04 at 14.15.34.png|width=533,height=195! .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32000) Expose vertex max parallelism in the WebUI

2023-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-32000:
---
Labels: pull-request-available  (was: )

> Expose vertex max parallelism in the WebUI
> --
>
> Key: FLINK-32000
> URL: https://issues.apache.org/jira/browse/FLINK-32000
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: David Morávek
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Screenshot 2023-05-04 at 14.15.34.png
>
>
> It would be great to expose max parallelism in the vertex detail drawer for 
> debug purposes  !Screenshot 2023-05-04 at 14.15.34.png|width=533,height=195! .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] dmvk opened a new pull request, #22521: [FLINK-32000][webui] Expose max parallelism in the vertex detail drawer.

2023-05-04 Thread via GitHub


dmvk opened a new pull request, #22521:
URL: https://github.com/apache/flink/pull/22521

   https://issues.apache.org/jira/browse/FLINK-32000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-32000) Expose vertex max parallelism in the WebUI

2023-05-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-32000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek updated FLINK-32000:
--
Description: It would be great to expose max parallelism in the vertex 
detail drawer for debug purposes  !Screenshot 2023-05-04 at 
14.15.34.png|width=533,height=195! .  (was: It would be great to expose max 
parallelism in the vertex detail drawer for debug purposes !Screenshot 
2023-05-04 at 14.15.34.png! .)

> Expose vertex max parallelism in the WebUI
> --
>
> Key: FLINK-32000
> URL: https://issues.apache.org/jira/browse/FLINK-32000
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: David Morávek
>Priority: Minor
> Attachments: Screenshot 2023-05-04 at 14.15.34.png
>
>
> It would be great to expose max parallelism in the vertex detail drawer for 
> debug purposes  !Screenshot 2023-05-04 at 14.15.34.png|width=533,height=195! .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32000) Expose vertex max parallelism in the WebUI

2023-05-04 Thread Jira
David Morávek created FLINK-32000:
-

 Summary: Expose vertex max parallelism in the WebUI
 Key: FLINK-32000
 URL: https://issues.apache.org/jira/browse/FLINK-32000
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Reporter: David Morávek
 Attachments: Screenshot 2023-05-04 at 14.15.34.png

It would be great to expose max parallelism in the vertex detail drawer for 
debug purposes !Screenshot 2023-05-04 at 14.15.34.png! .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] XComp commented on pull request #22380: [FLINK-31773][runtime] Separate DefaultLeaderElectionService.start(LeaderContender) into two separate methods for starting the driver and regis

2023-05-04 Thread via GitHub


XComp commented on PR #22380:
URL: https://github.com/apache/flink/pull/22380#issuecomment-1534661805

   Rebased to most-recent version of FLINK-31838 (PR #22422)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #22520: [FLINK-31996] Chaining operators with different max parallelism prevents rescaling

2023-05-04 Thread via GitHub


flinkbot commented on PR #22520:
URL: https://github.com/apache/flink/pull/22520#issuecomment-1534654730

   
   ## CI report:
   
   * 8c07ee4ee759fe351480649f43a359bf0881e4c9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] WencongLiu commented on a diff in pull request #22316: [FLINK-31638][network] Downstream supports reading buffers from tiered store

2023-05-04 Thread via GitHub


WencongLiu commented on code in PR #22316:
URL: https://github.com/apache/flink/pull/22316#discussion_r1184921505


##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGateBufferReaderAdapter.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition.consumer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Optional;
+
+/**
+ * {@link SingleInputGateBufferReaderAdapter} includes the logic of reading 
buffer in the {@link
+ * SingleInputGate}.
+ */
+public interface SingleInputGateBufferReaderAdapter extends Closeable {

Review Comment:
   There is no SingleInputGateBufferReaderAdapter now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] WencongLiu commented on a diff in pull request #22316: [FLINK-31638][network] Downstream supports reading buffers from tiered store

2023-05-04 Thread via GitHub


WencongLiu commented on code in PR #22316:
URL: https://github.com/apache/flink/pull/22316#discussion_r1184920929


##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/tier/TierConsumerAgent.java:
##
@@ -18,5 +18,21 @@
 
 package org.apache.flink.runtime.io.network.partition.hybrid.tiered.tier;
 
+import org.apache.flink.runtime.io.network.partition.consumer.InputChannel;
+
+import java.io.Closeable;
+import java.util.Optional;
+
 /** The consumer-side agent of a Tier. */
-public interface TierConsumerAgent {}
+public interface TierConsumerAgent extends Closeable {
+
+/**
+ * Get buffer from the client according to specific subpartition and 
segment id.
+ *
+ * @param inputChannel indicates the subpartition to read.
+ * @param segmentId indicate the id of segment.
+ * @return the next buffer.
+ */
+Optional getNextBuffer(
+InputChannel inputChannel, int segmentId);

Review Comment:
   InputChannel has been removed. BufferAndAvailability is actually the 
combination of essential information required by SingleInputGate.Maybe the 
public class BufferAndAvailability can be moved outside from InputChannel?



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/tier/TierConsumerAgent.java:
##
@@ -18,5 +18,21 @@
 
 package org.apache.flink.runtime.io.network.partition.hybrid.tiered.tier;
 
+import org.apache.flink.runtime.io.network.partition.consumer.InputChannel;
+
+import java.io.Closeable;
+import java.util.Optional;
+
 /** The consumer-side agent of a Tier. */
-public interface TierConsumerAgent {}
+public interface TierConsumerAgent extends Closeable {
+
+/**
+ * Get buffer from the client according to specific subpartition and 
segment id.
+ *
+ * @param inputChannel indicates the subpartition to read.
+ * @param segmentId indicate the id of segment.
+ * @return the next buffer.
+ */
+Optional getNextBuffer(
+InputChannel inputChannel, int segmentId);

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >