Re: [PR] [FLINK-35184][table-runtime] fix minibatch join hash collision release-1.19 [flink]

2024-05-01 Thread via GitHub


rovboyko commented on PR #24749:
URL: https://github.com/apache/flink/pull/24749#issuecomment-2089490837

   rerun the azure pipeline because previous run faced the bug - 
https://issues.apache.org/jira/browse/FLINK-34405


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35184][table-runtime] fix minibatch join hash collision release-1.19 [flink]

2024-05-01 Thread via GitHub


rovboyko commented on PR #24749:
URL: https://github.com/apache/flink/pull/24749#issuecomment-2089488999

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33761][Connector/JDBC] Add Snowflake JDBC Dialect [flink-connector-jdbc]

2024-05-01 Thread via GitHub


borislitvak commented on code in PR #118:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/118#discussion_r1586816863


##
flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/databases/snowflake/dialect/SnowflakeDialectFactory.java:
##
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.jdbc.databases.snowflake.dialect;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.connector.jdbc.dialect.JdbcDialect;
+import org.apache.flink.connector.jdbc.dialect.JdbcDialectFactory;
+
+/** Factory for {@link SnowflakeDialect}. */
+@Internal
+public class SnowflakeDialectFactory implements JdbcDialectFactory {
+@Override
+public boolean acceptsURL(String url) {
+return url.startsWith("jdbc:snowflake:");
+}
+
+@Override
+public JdbcDialect create() {

Review Comment:
   Docs pushed as well. Please take a look @davidradl 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33212][runtime] add job status changed listener for lineage [flink]

2024-05-01 Thread via GitHub


flinkbot commented on PR #24754:
URL: https://github.com/apache/flink/pull/24754#issuecomment-2088785395

   
   ## CI report:
   
   * 6b5b8889d2417c2510b1372a32c332bc3962cb99 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33212) Introduce job status changed listener for lineage

2024-05-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33212:
---
Labels: pull-request-available  (was: )

> Introduce job status changed listener for lineage
> -
>
> Key: FLINK-33212
> URL: https://issues.apache.org/jira/browse/FLINK-33212
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST
>Affects Versions: 1.20.0
>Reporter: Fang Yong
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Introduce job status changed listener relevant interfaces



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33212][runtime] add job status changed listener for lineage [flink]

2024-05-01 Thread via GitHub


HuangZhenQiu opened a new pull request, #24754:
URL: https://github.com/apache/flink/pull/24754

   ## What is the purpose of the change
   Add job status changed listener for lineage. Need to use this PR to discuss 
with community about whether to add lineage graph to JobGraph for web 
submission and failure recovery cases.
   
   ## Brief change log
 -  Add interfaces for job status change listener  
 -  Add Events and configs for job status listener.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
 - The end to end test is covered by JobStatusListenerITCase 
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-35280) Migrate HBase Sink connector to use the ASync Sink API

2024-05-01 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-35280:
--

 Summary: Migrate HBase Sink connector to use the ASync Sink API
 Key: FLINK-35280
 URL: https://issues.apache.org/jira/browse/FLINK-35280
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / HBase
Affects Versions: hbase-3.0.0, hbase-3.0.1, hbase-4.0.0
Reporter: Martijn Visser






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33761][Connector/JDBC] Add Snowflake JDBC Dialect [flink-connector-jdbc]

2024-05-01 Thread via GitHub


borislitvak commented on code in PR #118:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/118#discussion_r1586364059


##
flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/databases/snowflake/dialect/SnowflakeDialectFactory.java:
##
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.jdbc.databases.snowflake.dialect;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.connector.jdbc.dialect.JdbcDialect;
+import org.apache.flink.connector.jdbc.dialect.JdbcDialectFactory;
+
+/** Factory for {@link SnowflakeDialect}. */
+@Internal
+public class SnowflakeDialectFactory implements JdbcDialectFactory {
+@Override
+public boolean acceptsURL(String url) {
+return url.startsWith("jdbc:snowflake:");
+}
+
+@Override
+public JdbcDialect create() {

Review Comment:
   Pushed a unit test, docs are still WIP



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-35192) operator oom

2024-05-01 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842607#comment-17842607
 ] 

Gyula Fora commented on FLINK-35192:


Merged b1f4b3f35f907e1f6425ad2401dbf24309e6fb59 to main

> operator oom
> 
>
> Key: FLINK-35192
> URL: https://issues.apache.org/jira/browse/FLINK-35192
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.6.1
> Environment: jdk: openjdk11
> operator version: 1.6.1
>Reporter: chenyuzhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-04-22-15-47-49-455.png, 
> image-2024-04-22-15-52-51-600.png, image-2024-04-22-15-58-23-269.png, 
> image-2024-04-22-15-58-42-850.png, image-2024-04-30-16-47-07-289.png, 
> image-2024-04-30-17-11-24-974.png, image-2024-04-30-20-38-25-195.png, 
> image-2024-04-30-20-39-05-109.png, image-2024-04-30-20-39-34-396.png, 
> image-2024-04-30-20-41-51-660.png, image-2024-04-30-20-43-20-125.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> The kubernetest operator docker process was killed by kernel cause out of 
> memory(the time is 2024.04.03: 18:16)
>  !image-2024-04-22-15-47-49-455.png! 
> Metrics:
> the pod memory (RSS) is increasing slowly in the past 7 days:
>  !screenshot-1.png! 
> However the jvm memory metrics of operator not shown obvious anomaly:
>  !image-2024-04-22-15-58-23-269.png! 
>  !image-2024-04-22-15-58-42-850.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35192) operator oom

2024-05-01 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora updated FLINK-35192:
---
Fix Version/s: kubernetes-operator-1.9.0

> operator oom
> 
>
> Key: FLINK-35192
> URL: https://issues.apache.org/jira/browse/FLINK-35192
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.6.1
> Environment: jdk: openjdk11
> operator version: 1.6.1
>Reporter: chenyuzhi
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.9.0
>
> Attachments: image-2024-04-22-15-47-49-455.png, 
> image-2024-04-22-15-52-51-600.png, image-2024-04-22-15-58-23-269.png, 
> image-2024-04-22-15-58-42-850.png, image-2024-04-30-16-47-07-289.png, 
> image-2024-04-30-17-11-24-974.png, image-2024-04-30-20-38-25-195.png, 
> image-2024-04-30-20-39-05-109.png, image-2024-04-30-20-39-34-396.png, 
> image-2024-04-30-20-41-51-660.png, image-2024-04-30-20-43-20-125.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> The kubernetest operator docker process was killed by kernel cause out of 
> memory(the time is 2024.04.03: 18:16)
>  !image-2024-04-22-15-47-49-455.png! 
> Metrics:
> the pod memory (RSS) is increasing slowly in the past 7 days:
>  !screenshot-1.png! 
> However the jvm memory metrics of operator not shown obvious anomaly:
>  !image-2024-04-22-15-58-23-269.png! 
>  !image-2024-04-22-15-58-42-850.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35192) Kubernetes operator oom

2024-05-01 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora updated FLINK-35192:
---
Summary: Kubernetes operator oom  (was: operator oom)

> Kubernetes operator oom
> ---
>
> Key: FLINK-35192
> URL: https://issues.apache.org/jira/browse/FLINK-35192
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.6.1
> Environment: jdk: openjdk11
> operator version: 1.6.1
>Reporter: chenyuzhi
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.9.0
>
> Attachments: image-2024-04-22-15-47-49-455.png, 
> image-2024-04-22-15-52-51-600.png, image-2024-04-22-15-58-23-269.png, 
> image-2024-04-22-15-58-42-850.png, image-2024-04-30-16-47-07-289.png, 
> image-2024-04-30-17-11-24-974.png, image-2024-04-30-20-38-25-195.png, 
> image-2024-04-30-20-39-05-109.png, image-2024-04-30-20-39-34-396.png, 
> image-2024-04-30-20-41-51-660.png, image-2024-04-30-20-43-20-125.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> The kubernetest operator docker process was killed by kernel cause out of 
> memory(the time is 2024.04.03: 18:16)
>  !image-2024-04-22-15-47-49-455.png! 
> Metrics:
> the pod memory (RSS) is increasing slowly in the past 7 days:
>  !screenshot-1.png! 
> However the jvm memory metrics of operator not shown obvious anomaly:
>  !image-2024-04-22-15-58-23-269.png! 
>  !image-2024-04-22-15-58-42-850.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-35192] Remove usage of deleteOnExit() to reduce the memory usage of the operator [flink-kubernetes-operator]

2024-05-01 Thread via GitHub


gyfora merged PR #822:
URL: https://github.com/apache/flink-kubernetes-operator/pull/822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31223][sqlgateway] Introduce getFlinkConfigurationOptions to [flink]

2024-05-01 Thread via GitHub


davidradl commented on PR #24741:
URL: https://github.com/apache/flink/pull/24741#issuecomment-2088287202

   @reswqa junit errors in the pr build - I will investigate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31223][sqlgateway] Introduce getFlinkConfigurationOptions to [flink]

2024-05-01 Thread via GitHub


davidradl commented on PR #24741:
URL: https://github.com/apache/flink/pull/24741#issuecomment-2088277865

   @flinkbot run azure
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-35278) Occasional NPE on k8s operator status replacement

2024-05-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-35278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi reassigned FLINK-35278:
--

Assignee: Ferenc Csaky

> Occasional NPE on k8s operator status replacement
> -
>
> Key: FLINK-35278
> URL: https://issues.apache.org/jira/browse/FLINK-35278
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Márton Balassi
>Assignee: Ferenc Csaky
>Priority: Minor
> Fix For: kubernetes-operator-1.9.0
>
>
> Infrequently we get a null pointer exception on status replacement:
> {noformat}
> logger: io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher
> message: Error during error status handling.   
> throwable: { [-]
>  class: java.lang.NullPointerException
>  msg: null
>  stack: [ [-]
>
> org.apache.flink.kubernetes.operator.utils.StatusRecorder.replaceStatus(StatusRecorder.java:136)
>
> org.apache.flink.kubernetes.operator.utils.StatusRecorder.patchAndCacheStatus(StatusRecorder.java:97)
>
> org.apache.flink.kubernetes.operator.reconciler.ReconciliationUtils.toErrorStatusUpdateControl(ReconciliationUtils.java:438)
>
> org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.updateErrorStatus(FlinkDeploymentController.java:213)
>
> org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.updateErrorStatus(FlinkDeploymentController.java:60)
>
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleErrorStatusHandler(ReconciliationDispatcher.java:194)
>
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:123)
>
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
>
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
>
> io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:452)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>java.lang.Thread.run(Thread.java:829)
>  ]
>}{noformat}
> I suspect it probably is thrown by getResourceVersion() here:
> [https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/StatusRecorder.java#L136]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35217) Missing fsync in FileSystemCheckpointStorage

2024-05-01 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842587#comment-17842587
 ] 

Roman Khachatryan commented on FLINK-35217:
---

Backported

to 1.18 as e6726d3b962383d9a2576fe117d7566b205f514a and

to 1.19 as ac4aa35c6e2e2da87760ffbf45d85888b1976c2f.

> Missing fsync in FileSystemCheckpointStorage
> 
>
> Key: FLINK-35217
> URL: https://issues.apache.org/jira/browse/FLINK-35217
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Runtime / Checkpointing
>Affects Versions: 1.17.0, 1.18.0, 1.19.0
>Reporter: Marc Aurel Fritz
>Assignee: Stefan Richter
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> While running Flink on a system with unstable power supply checkpoints were 
> regularly corrupted in the form of "_metadata" files with a file size of 0 
> bytes. In all cases the previous checkpoint data had already been deleted, 
> causing progress to be lost completely.
> Further investigation revealed that the "FileSystemCheckpointStorage" doesn't 
> perform "fsync" when writing a new checkpoint to disk. This means the old 
> checkpoint gets removed without making sure that the new one is durably 
> persisted on disk. "strace" on the jobmanager's process confirms this 
> behavior:
>  # The checkpoint chk-60's in-progress metadata is written at "openat"
>  # The checkpoint chk-60's in-progress metadata is atomically renamed at 
> "rename"
>  # The old checkpoint chk-59 is deleted at "unlink"
> For durable persistence an "fsync" call is missing before step 3.
> Full "strace" log:
> {code:java}
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
> 0x7fd2ad5fc970) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
> 0x7fd2ad5fca00) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc", 
> {st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
> [pid 51618] 11:44:30 
> mkdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 0777) 
> = 0
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata",
>  0x7fd2ad5fc860) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata",
>  0x7fd2ad5fc740) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
>  0x7fd2ad5fc7d0) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51618] 11:44:30 openat(AT_FDCWD, 
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
>  O_WRONLY|O_CREAT|O_EXCL, 0666) = 168
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
>  {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
> [pid 51618] 11:44:30 
> rename("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
>  "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata") = > 0
> [pid 51644] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata",
>  {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
> [pid 51644] 11:44:30 
> unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata")
>  = 0
> [pid 51644] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51644] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51644] 11:44:30 openat(AT_FDCWD, 
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168
> [pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0, 
> ...}, AT_EMPTY_PATH) = 0
> [pid 51644] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51644] 11:44:30 openat(AT_FDCWD, 
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> 

Re: [PR] [FLINK-35232] Add retry settings for GCS connector [flink]

2024-05-01 Thread via GitHub


flinkbot commented on PR #24753:
URL: https://github.com/apache/flink/pull/24753#issuecomment-2088082229

   
   ## CI report:
   
   * 1fb740aaac47908e4162fb8cef213c965841288f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35232) Support for retry settings on GCS connector

2024-05-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35232:
---
Labels: pull-request-available  (was: )

> Support for retry settings on GCS connector
> ---
>
> Key: FLINK-35232
> URL: https://issues.apache.org/jira/browse/FLINK-35232
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.15.3, 1.16.2, 1.17.1, 1.19.0, 1.18.1
>Reporter: Vikas M
>Assignee: Ravi Singh
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/FLINK-32877 is tracking ability to 
> specify transport options in GCS connector. While setting the params enabled 
> here reduced read timeouts, we still see 503 errors leading to Flink job 
> restarts.
> Thus, in this ticket, we want to specify additional retry settings as noted 
> in 
> [https://cloud.google.com/storage/docs/retry-strategy#customize-retries.|https://cloud.google.com/storage/docs/retry-strategy#customize-retries]
> We need 
> [these|https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#methods]
>  methods available for Flink users so that they can customize their 
> deployment. In particular next settings seems to be the minimum required to 
> adjust GCS timeout with Job's checkpoint config:
>  * 
> [maxAttempts|https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getMaxAttempts__]
>  * 
> [initialRpcTimeout|https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getInitialRpcTimeout__]
>  * 
> [rpcTimeoutMultiplier|https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getRpcTimeoutMultiplier__]
>  * 
> [maxRpcTimeout|https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getMaxRpcTimeout__]
>  * 
> [totalTimeout|https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getTotalTimeout__]
>  
> Basically the proposal is to be able to tune the timeout via multiplier, 
> maxAttemts + totalTimeout mechanisms.
> All of the config options should be optional and the default one should be 
> used in case some of configs are not provided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-35232] Add retry settings for GCS connector [flink]

2024-05-01 Thread via GitHub


JTaky opened a new pull request, #24753:
URL: https://github.com/apache/flink/pull/24753

   
   
   Provide a way to ingest retrier settings configuration to gcs-cloud-storage 
library in use.
   
   
   ## Brief change log
   
   - exposed `gs.retry.max-attempt`, `gs.retry.init-rpc-timeout`, 
`gs.retry.rpc-timeout-multiplier`,  and `gs.retry.rpc-timeout-multiplier` and 
`gs.retry.total-timeout` configuration.
   - Pluming retry settings configurations to gcs-cloud-storage client
   - update the documentation for GCS
   
   ## Verifying this change
   
   - This change is already covered by existing tests, such as 
GSRecoverableWriterTest*.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
   Does this pull request introduce a new feature? yes
   If yes, how is the feature documented? docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org