[jira] [Comment Edited] (FLINK-34379) table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError

2024-05-28 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849915#comment-17849915
 ] 

Hong Liang Teoh edited comment on FLINK-34379 at 5/28/24 7:31 AM:
--

[~jeyhunkarimov] Thanks for the fix! We should backport the patch to 1.19 + 
1.18 as well :)


was (Author: hong):
[~jeyhunkarimov] could we please backport the patch to 1.19 + 1.18 as well 
please?

> table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError
> --
>
> Key: FLINK-34379
> URL: https://issues.apache.org/jira/browse/FLINK-34379
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.17.2, 1.18.1
> Environment: 1.17.1
>Reporter: zhu
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.17.3, 1.18.2, 1.20.0, 1.19.1
>
>
> When using batch computing, I union all about 50 tables and then join other 
> table. When compiling the execution plan, 
> there throws OutOfMemoryError: Java heap space, which was no problem in  
> 1.15.2. However, both 1.17.2 and 1.18.1 all throws same errors,This causes 
> jobmanager to restart. Currently,it has been found that this is caused by 
> table.optimizer.dynamic-filtering.enabled, which defaults is true,When I set 
> table.optimizer.dynamic-filtering.enabled to false, it can be compiled and 
> executed normally
> code
> TableEnvironment.create(EnvironmentSettings.newInstance()
> .withConfiguration(configuration)
> .inBatchMode().build())
> sql=select att,filename,'table0' as mo_name from table0 UNION All select 
> att,filename,'table1' as mo_name from table1 UNION All select 
> att,filename,'table2' as mo_name from table2 UNION All select 
> att,filename,'table3' as mo_name from table3 UNION All select 
> att,filename,'table4' as mo_name from table4 UNION All select 
> att,filename,'table5' as mo_name from table5 UNION All select 
> att,filename,'table6' as mo_name from table6 UNION All select 
> att,filename,'table7' as mo_name from table7 UNION All select 
> att,filename,'table8' as mo_name from table8 UNION All select 
> att,filename,'table9' as mo_name from table9 UNION All select 
> att,filename,'table10' as mo_name from table10 UNION All select 
> att,filename,'table11' as mo_name from table11 UNION All select 
> att,filename,'table12' as mo_name from table12 UNION All select 
> att,filename,'table13' as mo_name from table13 UNION All select 
> att,filename,'table14' as mo_name from table14 UNION All select 
> att,filename,'table15' as mo_name from table15 UNION All select 
> att,filename,'table16' as mo_name from table16 UNION All select 
> att,filename,'table17' as mo_name from table17 UNION All select 
> att,filename,'table18' as mo_name from table18 UNION All select 
> att,filename,'table19' as mo_name from table19 UNION All select 
> att,filename,'table20' as mo_name from table20 UNION All select 
> att,filename,'table21' as mo_name from table21 UNION All select 
> att,filename,'table22' as mo_name from table22 UNION All select 
> att,filename,'table23' as mo_name from table23 UNION All select 
> att,filename,'table24' as mo_name from table24 UNION All select 
> att,filename,'table25' as mo_name from table25 UNION All select 
> att,filename,'table26' as mo_name from table26 UNION All select 
> att,filename,'table27' as mo_name from table27 UNION All select 
> att,filename,'table28' as mo_name from table28 UNION All select 
> att,filename,'table29' as mo_name from table29 UNION All select 
> att,filename,'table30' as mo_name from table30 UNION All select 
> att,filename,'table31' as mo_name from table31 UNION All select 
> att,filename,'table32' as mo_name from table32 UNION All select 
> att,filename,'table33' as mo_name from table33 UNION All select 
> att,filename,'table34' as mo_name from table34 UNION All select 
> att,filename,'table35' as mo_name from table35 UNION All select 
> att,filename,'table36' as mo_name from table36 UNION All select 
> att,filename,'table37' as mo_name from table37 UNION All select 
> att,filename,'table38' as mo_name from table38 UNION All select 
> att,filename,'table39' as mo_name from table39 UNION All select 
> att,filename,'table40' as mo_name from table40 UNION All select 
> att,filename,'table41' as mo_name from table41 UNION All select 
> att,filename,'table42' as mo_name from table42 UNION All select 
> att,filename,'table43' as mo_name from table43 UNION All select 
> att,filename,'table44' as mo_name from table44 UNION All select 
> att,filename,'table45' as mo_name from table45 UNION All select 
> att,filename,'table46' as mo_name from table46 UNION All select 
> att,filename,'table47' as mo_name from table47 UNION All select 
> 

[jira] [Commented] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService

2024-05-28 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849916#comment-17849916
 ] 

Hong Liang Teoh commented on FLINK-34672:
-

Ok - sounds good. Will proceed with 1.19.1 RC without this

> HA deadlock between JobMasterServiceLeadershipRunner and 
> DefaultLeaderElectionService
> -
>
> Key: FLINK-34672
> URL: https://issues.apache.org/jira/browse/FLINK-34672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: Chesnay Schepler
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> We recently observed a deadlock in the JM within the HA system.
> (see below for the thread dump)
> [~mapohl] and I looked a bit into it and there appears to be a race condition 
> when leadership is revoked while a JobMaster is being started.
> It appears to be caused by 
> {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} 
> forwarding futures while holding a lock; depending on whether the forwarded 
> future is already complete the next stage may or may not run while holding 
> that same lock.
> We haven't determined yet whether we should be holding that lock or not.
> {code}
> "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 
> daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 
> nid=0x19d waiting for monitor entry  [0x7f53084fd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462)
> - waiting to lock <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown
>  Source)
> at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown
>  Source)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549)
> - locked <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
> {code}
> {code}
> "jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms 
> elapsed=78699.01s tid=0x7f5321c6e800 nid=0x396 waiting for monitor entry  
> [0x7f530567d000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366)
> - waiting to lock <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520)
> - locked <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$1320/0x000840e1a840.accept(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.22/CompletableFuture.java:859)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.22/CompletableFuture.java:837)
> at 
> 

[jira] [Commented] (FLINK-34379) table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError

2024-05-28 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849915#comment-17849915
 ] 

Hong Liang Teoh commented on FLINK-34379:
-

[~jeyhunkarimov] could we please backport the patch to 1.19 + 1.18 as well 
please?

> table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError
> --
>
> Key: FLINK-34379
> URL: https://issues.apache.org/jira/browse/FLINK-34379
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.17.2, 1.18.1
> Environment: 1.17.1
>Reporter: zhu
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.17.3, 1.18.2, 1.20.0, 1.19.1
>
>
> When using batch computing, I union all about 50 tables and then join other 
> table. When compiling the execution plan, 
> there throws OutOfMemoryError: Java heap space, which was no problem in  
> 1.15.2. However, both 1.17.2 and 1.18.1 all throws same errors,This causes 
> jobmanager to restart. Currently,it has been found that this is caused by 
> table.optimizer.dynamic-filtering.enabled, which defaults is true,When I set 
> table.optimizer.dynamic-filtering.enabled to false, it can be compiled and 
> executed normally
> code
> TableEnvironment.create(EnvironmentSettings.newInstance()
> .withConfiguration(configuration)
> .inBatchMode().build())
> sql=select att,filename,'table0' as mo_name from table0 UNION All select 
> att,filename,'table1' as mo_name from table1 UNION All select 
> att,filename,'table2' as mo_name from table2 UNION All select 
> att,filename,'table3' as mo_name from table3 UNION All select 
> att,filename,'table4' as mo_name from table4 UNION All select 
> att,filename,'table5' as mo_name from table5 UNION All select 
> att,filename,'table6' as mo_name from table6 UNION All select 
> att,filename,'table7' as mo_name from table7 UNION All select 
> att,filename,'table8' as mo_name from table8 UNION All select 
> att,filename,'table9' as mo_name from table9 UNION All select 
> att,filename,'table10' as mo_name from table10 UNION All select 
> att,filename,'table11' as mo_name from table11 UNION All select 
> att,filename,'table12' as mo_name from table12 UNION All select 
> att,filename,'table13' as mo_name from table13 UNION All select 
> att,filename,'table14' as mo_name from table14 UNION All select 
> att,filename,'table15' as mo_name from table15 UNION All select 
> att,filename,'table16' as mo_name from table16 UNION All select 
> att,filename,'table17' as mo_name from table17 UNION All select 
> att,filename,'table18' as mo_name from table18 UNION All select 
> att,filename,'table19' as mo_name from table19 UNION All select 
> att,filename,'table20' as mo_name from table20 UNION All select 
> att,filename,'table21' as mo_name from table21 UNION All select 
> att,filename,'table22' as mo_name from table22 UNION All select 
> att,filename,'table23' as mo_name from table23 UNION All select 
> att,filename,'table24' as mo_name from table24 UNION All select 
> att,filename,'table25' as mo_name from table25 UNION All select 
> att,filename,'table26' as mo_name from table26 UNION All select 
> att,filename,'table27' as mo_name from table27 UNION All select 
> att,filename,'table28' as mo_name from table28 UNION All select 
> att,filename,'table29' as mo_name from table29 UNION All select 
> att,filename,'table30' as mo_name from table30 UNION All select 
> att,filename,'table31' as mo_name from table31 UNION All select 
> att,filename,'table32' as mo_name from table32 UNION All select 
> att,filename,'table33' as mo_name from table33 UNION All select 
> att,filename,'table34' as mo_name from table34 UNION All select 
> att,filename,'table35' as mo_name from table35 UNION All select 
> att,filename,'table36' as mo_name from table36 UNION All select 
> att,filename,'table37' as mo_name from table37 UNION All select 
> att,filename,'table38' as mo_name from table38 UNION All select 
> att,filename,'table39' as mo_name from table39 UNION All select 
> att,filename,'table40' as mo_name from table40 UNION All select 
> att,filename,'table41' as mo_name from table41 UNION All select 
> att,filename,'table42' as mo_name from table42 UNION All select 
> att,filename,'table43' as mo_name from table43 UNION All select 
> att,filename,'table44' as mo_name from table44 UNION All select 
> att,filename,'table45' as mo_name from table45 UNION All select 
> att,filename,'table46' as mo_name from table46 UNION All select 
> att,filename,'table47' as mo_name from table47 UNION All select 
> att,filename,'table48' as mo_name from table48 UNION All select 
> att,filename,'table49' as mo_name from table49 UNION All select 
> att,filename,'table50' as mo_name from table50 

[jira] [Commented] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService

2024-05-22 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848646#comment-17848646
 ] 

Hong Liang Teoh commented on FLINK-34672:
-

[~mapohl] do we know if this should be a blocker to the Flink 1.19.1 patch 
release?

> HA deadlock between JobMasterServiceLeadershipRunner and 
> DefaultLeaderElectionService
> -
>
> Key: FLINK-34672
> URL: https://issues.apache.org/jira/browse/FLINK-34672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: Chesnay Schepler
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> We recently observed a deadlock in the JM within the HA system.
> (see below for the thread dump)
> [~mapohl] and I looked a bit into it and there appears to be a race condition 
> when leadership is revoked while a JobMaster is being started.
> It appears to be caused by 
> {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} 
> forwarding futures while holding a lock; depending on whether the forwarded 
> future is already complete the next stage may or may not run while holding 
> that same lock.
> We haven't determined yet whether we should be holding that lock or not.
> {code}
> "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 
> daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 
> nid=0x19d waiting for monitor entry  [0x7f53084fd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462)
> - waiting to lock <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown
>  Source)
> at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown
>  Source)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549)
> - locked <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
> {code}
> {code}
> "jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms 
> elapsed=78699.01s tid=0x7f5321c6e800 nid=0x396 waiting for monitor entry  
> [0x7f530567d000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366)
> - waiting to lock <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520)
> - locked <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$1320/0x000840e1a840.accept(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.22/CompletableFuture.java:859)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.22/CompletableFuture.java:837)
> at 

[jira] [Comment Edited] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService

2024-05-22 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848646#comment-17848646
 ] 

Hong Liang Teoh edited comment on FLINK-34672 at 5/22/24 2:29 PM:
--

[~mapohl] do we have any updates here? Wondering if this should be a blocker to 
the Flink 1.19.1 patch release!


was (Author: hong):
[~mapohl] do we know if this should be a blocker to the Flink 1.19.1 patch 
release?

> HA deadlock between JobMasterServiceLeadershipRunner and 
> DefaultLeaderElectionService
> -
>
> Key: FLINK-34672
> URL: https://issues.apache.org/jira/browse/FLINK-34672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: Chesnay Schepler
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> We recently observed a deadlock in the JM within the HA system.
> (see below for the thread dump)
> [~mapohl] and I looked a bit into it and there appears to be a race condition 
> when leadership is revoked while a JobMaster is being started.
> It appears to be caused by 
> {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} 
> forwarding futures while holding a lock; depending on whether the forwarded 
> future is already complete the next stage may or may not run while holding 
> that same lock.
> We haven't determined yet whether we should be holding that lock or not.
> {code}
> "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 
> daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 
> nid=0x19d waiting for monitor entry  [0x7f53084fd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462)
> - waiting to lock <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown
>  Source)
> at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown
>  Source)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549)
> - locked <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
> {code}
> {code}
> "jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms 
> elapsed=78699.01s tid=0x7f5321c6e800 nid=0x396 waiting for monitor entry  
> [0x7f530567d000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366)
> - waiting to lock <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520)
> - locked <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$1320/0x000840e1a840.accept(Unknown
>  Source)
> at 
> 

[jira] [Commented] (FLINK-34379) table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError

2024-05-22 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848644#comment-17848644
 ] 

Hong Liang Teoh commented on FLINK-34379:
-

[~jeyhunkarimov] Any progress on this patch?

> table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError
> --
>
> Key: FLINK-34379
> URL: https://issues.apache.org/jira/browse/FLINK-34379
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.17.2, 1.18.1
> Environment: 1.17.1
>Reporter: zhu
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.17.3, 1.18.2, 1.20.0, 1.19.1
>
>
> When using batch computing, I union all about 50 tables and then join other 
> table. When compiling the execution plan, 
> there throws OutOfMemoryError: Java heap space, which was no problem in  
> 1.15.2. However, both 1.17.2 and 1.18.1 all throws same errors,This causes 
> jobmanager to restart. Currently,it has been found that this is caused by 
> table.optimizer.dynamic-filtering.enabled, which defaults is true,When I set 
> table.optimizer.dynamic-filtering.enabled to false, it can be compiled and 
> executed normally
> code
> TableEnvironment.create(EnvironmentSettings.newInstance()
> .withConfiguration(configuration)
> .inBatchMode().build())
> sql=select att,filename,'table0' as mo_name from table0 UNION All select 
> att,filename,'table1' as mo_name from table1 UNION All select 
> att,filename,'table2' as mo_name from table2 UNION All select 
> att,filename,'table3' as mo_name from table3 UNION All select 
> att,filename,'table4' as mo_name from table4 UNION All select 
> att,filename,'table5' as mo_name from table5 UNION All select 
> att,filename,'table6' as mo_name from table6 UNION All select 
> att,filename,'table7' as mo_name from table7 UNION All select 
> att,filename,'table8' as mo_name from table8 UNION All select 
> att,filename,'table9' as mo_name from table9 UNION All select 
> att,filename,'table10' as mo_name from table10 UNION All select 
> att,filename,'table11' as mo_name from table11 UNION All select 
> att,filename,'table12' as mo_name from table12 UNION All select 
> att,filename,'table13' as mo_name from table13 UNION All select 
> att,filename,'table14' as mo_name from table14 UNION All select 
> att,filename,'table15' as mo_name from table15 UNION All select 
> att,filename,'table16' as mo_name from table16 UNION All select 
> att,filename,'table17' as mo_name from table17 UNION All select 
> att,filename,'table18' as mo_name from table18 UNION All select 
> att,filename,'table19' as mo_name from table19 UNION All select 
> att,filename,'table20' as mo_name from table20 UNION All select 
> att,filename,'table21' as mo_name from table21 UNION All select 
> att,filename,'table22' as mo_name from table22 UNION All select 
> att,filename,'table23' as mo_name from table23 UNION All select 
> att,filename,'table24' as mo_name from table24 UNION All select 
> att,filename,'table25' as mo_name from table25 UNION All select 
> att,filename,'table26' as mo_name from table26 UNION All select 
> att,filename,'table27' as mo_name from table27 UNION All select 
> att,filename,'table28' as mo_name from table28 UNION All select 
> att,filename,'table29' as mo_name from table29 UNION All select 
> att,filename,'table30' as mo_name from table30 UNION All select 
> att,filename,'table31' as mo_name from table31 UNION All select 
> att,filename,'table32' as mo_name from table32 UNION All select 
> att,filename,'table33' as mo_name from table33 UNION All select 
> att,filename,'table34' as mo_name from table34 UNION All select 
> att,filename,'table35' as mo_name from table35 UNION All select 
> att,filename,'table36' as mo_name from table36 UNION All select 
> att,filename,'table37' as mo_name from table37 UNION All select 
> att,filename,'table38' as mo_name from table38 UNION All select 
> att,filename,'table39' as mo_name from table39 UNION All select 
> att,filename,'table40' as mo_name from table40 UNION All select 
> att,filename,'table41' as mo_name from table41 UNION All select 
> att,filename,'table42' as mo_name from table42 UNION All select 
> att,filename,'table43' as mo_name from table43 UNION All select 
> att,filename,'table44' as mo_name from table44 UNION All select 
> att,filename,'table45' as mo_name from table45 UNION All select 
> att,filename,'table46' as mo_name from table46 UNION All select 
> att,filename,'table47' as mo_name from table47 UNION All select 
> att,filename,'table48' as mo_name from table48 UNION All select 
> att,filename,'table49' as mo_name from table49 UNION All select 
> att,filename,'table50' as mo_name from table50 UNION All select 
> 

[jira] [Resolved] (FLINK-35383) Update compatibility matrix to include 1.19 release

2024-05-21 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-35383.
-
Resolution: Fixed

> Update compatibility matrix to include 1.19 release
> ---
>
> Key: FLINK-35383
> URL: https://issues.apache.org/jira/browse/FLINK-35383
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> Update compatibility matrix in documentation to include Flink 1.19 release:
> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/ops/upgrading/#compatibility-table
> https://nightlies.apache.org/flink/flink-docs-release-1.19/zh/docs/ops/upgrading/#%E5%85%BC%E5%AE%B9%E6%80%A7%E9%80%9F%E6%9F%A5%E8%A1%A8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35383) Update compatibility matrix to include 1.19 release

2024-05-21 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848246#comment-17848246
 ] 

Hong Liang Teoh commented on FLINK-35383:
-

merged commit 
[{{8b73ca9}}|https://github.com/apache/flink/commit/8b73ca955bcd796916597046dd8aa80407d0aa07]
 into   apache:master

> Update compatibility matrix to include 1.19 release
> ---
>
> Key: FLINK-35383
> URL: https://issues.apache.org/jira/browse/FLINK-35383
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> Update compatibility matrix in documentation to include Flink 1.19 release:
> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/ops/upgrading/#compatibility-table
> https://nightlies.apache.org/flink/flink-docs-release-1.19/zh/docs/ops/upgrading/#%E5%85%BC%E5%AE%B9%E6%80%A7%E9%80%9F%E6%9F%A5%E8%A1%A8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35383) Update compatibility matrix to include 1.19 release

2024-05-21 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848247#comment-17848247
 ] 

Hong Liang Teoh commented on FLINK-35383:
-

merged commit 
[{{c4af0c3}}|https://github.com/apache/flink/commit/c4af0c388d07316282819dc2741fa7bc758fa767]
 into   apache:release-1.19

> Update compatibility matrix to include 1.19 release
> ---
>
> Key: FLINK-35383
> URL: https://issues.apache.org/jira/browse/FLINK-35383
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> Update compatibility matrix in documentation to include Flink 1.19 release:
> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/ops/upgrading/#compatibility-table
> https://nightlies.apache.org/flink/flink-docs-release-1.19/zh/docs/ops/upgrading/#%E5%85%BC%E5%AE%B9%E6%80%A7%E9%80%9F%E6%9F%A5%E8%A1%A8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35396) DynamoDB Streams Consumer consumption data is out of order

2024-05-19 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847700#comment-17847700
 ] 

Hong Liang Teoh commented on FLINK-35396:
-

Here is the Parent-Jira that will introduce the new DynamoDBStreams connector 
based on FLIP-27 Source APIs, https://issues.apache.org/jira/browse/FLINK-24438

 

This is the Jira that will address parent-child shard ordering. 
https://issues.apache.org/jira/browse/FLINK-32218

 

If you are ok, I will resolve this as a duplicate to this JIRA! 
https://issues.apache.org/jira/browse/FLINK-32218

> DynamoDB Streams Consumer consumption data is out of order
> --
>
> Key: FLINK-35396
> URL: https://issues.apache.org/jira/browse/FLINK-35396
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS, Connectors / Kinesis
>Affects Versions: 1.16.2, aws-connector-4.2.0
>Reporter: Suxing Lee
>Priority: Major
>  Labels: AWS
>
> When we use `FlinkDynamoDBStreamsConsumer` in 
> `flink-connector-aws/flink-connector-kinesis` to consume dynamodb stream 
> data, there is an out-of-order problem.
> The service exception log is as follows:
> {noformat}
> 2024-05-06 00:00:40,639 INFO  
> org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher [] 
> - Subtask 0 has discovered a new shard 
> StreamShardHandle{streamName='arn:aws:dynamodb:ap-southeast-1:***', 
> shard='{ShardId: shardId-0001714924828427-d73b6b68,
>   ParentShardId: shardId-0001714910797443-fb1d3b22,HashKeyRange: 
> {StartingHashKey: 0,EndingHashKey: 1},SequenceNumberRange: 
> {StartingSequenceNumber: 29583764058201168012,}}'} due to resharding, 
> and will start consuming the shard from sequence number EARLIEST_SEQUENCE_NUM 
> with ShardConsumer 2807
> ..
> ..
> ..
> 2024-05-06 00:00:46,729 INFO  
> org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher [] 
> - Subtask 0 has reached the end of subscribed shard: 
> StreamShardHandle{streamName='arn:aws:dynamodb:ap-southeast-1:***', 
> shard='{ShardId: shardId-0001714910797443-fb1d3b22,ParentShardId: 
> shardId-0001714897099372-17932b9a,HashKeyRange: {StartingHashKey: 
> 0,EndingHashKey: 1},SequenceNumberRange: {StartingSequenceNumber: 
> 29554409051102788386,}}'}
> {noformat}
> It looks like the failure process is:
> `2024-05-06 00:00:40,639` A new shard is discovered and new sub-shards are 
> consumed immediately.(ShardId: shardId-0001714924828427-d73b6b68).
> `2024-05-06 00:00:46,729` Consume the old parent shard:(ShardId: 
> shardId-0001714910797443-fb1d3b22)end.
> There was a gap of 6 seconds. In other words, before the data consumption of 
> the parent shard has finished, the child shard has already started consuming 
> data. This causes the data we read to be sent downstream out of order.
> https://github.com/apache/flink-connector-aws/blob/c688a8545ac1001c8450e8c9c5fe8bbafa13aeba/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java#L689-L740
>  
> This is because the code immediately submits `ShardConsumer` to 
> `shardConsumersExecutor` when `discoverNewShards` is created, and 
> `shardConsumersExecutor` is created through Executors.newCachedThreadPool(), 
> which does not limit the number of threads, causing new and old shards to be 
> consumed at the same time , so data consumption is out of order?
> `flink-connector-kinesis` relies on `dynamodb-streams-kinesis-adapter` to 
> subscribe to messages from dynamodb stream. But why does 
> `dynamodb-streams-kinesis-adapter` directly consume data without similar 
> problems?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35396) DynamoDB Streams Consumer consumption data is out of order

2024-05-19 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847698#comment-17847698
 ] 

Hong Liang Teoh commented on FLINK-35396:
-

Hi [~Suxing Lee], thanks for reporting this problem. This issue is not the 
fault of the DDBStreams adapter, but rather a limitation of the 
FlinkKinesisConsumer itself, where parent-child shard ordering is not 
respected. See associated Jira FLINK-6349.

In flink-connector-aws/flink-connector-kinesis, the 
FlinkDynamoDBStreamsConsumer uses the same framework as the 
FlinkKinesisConsumer, but the Kinesis client is swapped out to DynamoDB streams 
client using the dynamodb-streams-kinesis-adapter. Since the base logic of 
FlinkKinesisConsumer doesn't respect parent-child shard ordering, it doesn't 
matter what the adapter does. The moment a shard is created, the Flink job will 
discover it and read from it.

We are working on a new DynamoDB Streams connector based on FLIP-27, which 
should resolve this issue.

> DynamoDB Streams Consumer consumption data is out of order
> --
>
> Key: FLINK-35396
> URL: https://issues.apache.org/jira/browse/FLINK-35396
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS, Connectors / Kinesis
>Affects Versions: 1.16.2, aws-connector-4.2.0
>Reporter: Suxing Lee
>Priority: Major
>  Labels: AWS
>
> When we use `FlinkDynamoDBStreamsConsumer` in 
> `flink-connector-aws/flink-connector-kinesis` to consume dynamodb stream 
> data, there is an out-of-order problem.
> The service exception log is as follows:
> {noformat}
> 2024-05-06 00:00:40,639 INFO  
> org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher [] 
> - Subtask 0 has discovered a new shard 
> StreamShardHandle{streamName='arn:aws:dynamodb:ap-southeast-1:***', 
> shard='{ShardId: shardId-0001714924828427-d73b6b68,
>   ParentShardId: shardId-0001714910797443-fb1d3b22,HashKeyRange: 
> {StartingHashKey: 0,EndingHashKey: 1},SequenceNumberRange: 
> {StartingSequenceNumber: 29583764058201168012,}}'} due to resharding, 
> and will start consuming the shard from sequence number EARLIEST_SEQUENCE_NUM 
> with ShardConsumer 2807
> ..
> ..
> ..
> 2024-05-06 00:00:46,729 INFO  
> org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher [] 
> - Subtask 0 has reached the end of subscribed shard: 
> StreamShardHandle{streamName='arn:aws:dynamodb:ap-southeast-1:***', 
> shard='{ShardId: shardId-0001714910797443-fb1d3b22,ParentShardId: 
> shardId-0001714897099372-17932b9a,HashKeyRange: {StartingHashKey: 
> 0,EndingHashKey: 1},SequenceNumberRange: {StartingSequenceNumber: 
> 29554409051102788386,}}'}
> {noformat}
> It looks like the failure process is:
> `2024-05-06 00:00:40,639` A new shard is discovered and new sub-shards are 
> consumed immediately.(ShardId: shardId-0001714924828427-d73b6b68).
> `2024-05-06 00:00:46,729` Consume the old parent shard:(ShardId: 
> shardId-0001714910797443-fb1d3b22)end.
> There was a gap of 6 seconds. In other words, before the data consumption of 
> the parent shard has finished, the child shard has already started consuming 
> data. This causes the data we read to be sent downstream out of order.
> https://github.com/apache/flink-connector-aws/blob/c688a8545ac1001c8450e8c9c5fe8bbafa13aeba/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java#L689-L740
>  
> This is because the code immediately submits `ShardConsumer` to 
> `shardConsumersExecutor` when `discoverNewShards` is created, and 
> `shardConsumersExecutor` is created through Executors.newCachedThreadPool(), 
> which does not limit the number of threads, causing new and old shards to be 
> consumed at the same time , so data consumption is out of order?
> `flink-connector-kinesis` relies on `dynamodb-streams-kinesis-adapter` to 
> subscribe to messages from dynamodb stream. But why does 
> `dynamodb-streams-kinesis-adapter` directly consume data without similar 
> problems?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-35269) Fix logging level for errors in AWS connector sinks

2024-05-17 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-35269.
-
  Assignee: Aleksandr Pilipenko
Resolution: Fixed

> Fix logging level for errors in AWS connector sinks
> ---
>
> Key: FLINK-35269
> URL: https://issues.apache.org/jira/browse/FLINK-35269
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / AWS, Connectors / Firehose, Connectors / 
> Kinesis
>Affects Versions: aws-connector-4.2.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.3.0
>
>
> Currently Kinesis and Firehose sinks log information about failed records at 
> DEBUG level.
> This makes failures invisible in the logs, limiting ability to investigate 
> source of the issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35269) Fix logging level for errors in AWS connector sinks

2024-05-17 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847213#comment-17847213
 ] 

Hong Liang Teoh commented on FLINK-35269:
-

 merged commit 
[{{c688a85}}|https://github.com/apache/flink-connector-aws/commit/c688a8545ac1001c8450e8c9c5fe8bbafa13aeba]
 into   apache:main

> Fix logging level for errors in AWS connector sinks
> ---
>
> Key: FLINK-35269
> URL: https://issues.apache.org/jira/browse/FLINK-35269
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / AWS, Connectors / Firehose, Connectors / 
> Kinesis
>Affects Versions: aws-connector-4.2.0
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> Currently Kinesis and Firehose sinks log information about failed records at 
> DEBUG level.
> This makes failures invisible in the logs, limiting ability to investigate 
> source of the issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35269) Fix logging level for errors in AWS connector sinks

2024-05-17 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-35269:

Fix Version/s: aws-connector-4.3.0

> Fix logging level for errors in AWS connector sinks
> ---
>
> Key: FLINK-35269
> URL: https://issues.apache.org/jira/browse/FLINK-35269
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / AWS, Connectors / Firehose, Connectors / 
> Kinesis
>Affects Versions: aws-connector-4.2.0
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.3.0
>
>
> Currently Kinesis and Firehose sinks log information about failed records at 
> DEBUG level.
> This makes failures invisible in the logs, limiting ability to investigate 
> source of the issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35383) Update compatibility matrix to include 1.19 release

2024-05-16 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-35383:
---

Assignee: Aleksandr Pilipenko

> Update compatibility matrix to include 1.19 release
> ---
>
> Key: FLINK-35383
> URL: https://issues.apache.org/jira/browse/FLINK-35383
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> Update compatibility matrix in documentation to include Flink 1.19 release:
> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/ops/upgrading/#compatibility-table
> https://nightlies.apache.org/flink/flink-docs-release-1.19/zh/docs/ops/upgrading/#%E5%85%BC%E5%AE%B9%E6%80%A7%E9%80%9F%E6%9F%A5%E8%A1%A8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35299) FlinkKinesisConsumer does not respect StreamInitialPosition for new Kinesis Stream when restoring from snapshot

2024-05-06 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-35299:

Description: 
h3. What

The FlinkKinesisConsumer allows users to read from [multiple Kinesis 
Streams|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L224].

Users can also specify a STREAM_INITIAL_POSITION, which configures if the 
consumer starts reading the stream from TRIM_HORIZON / LATEST / AT_TIMESTAMP.

When restoring the Kinesis Consumer from an existing snapshot, users can 
configure the consumer to read from additional Kinesis Streams. The expected 
behavior would be for the FlinkKinesisConsumer to start reading from the 
additional Kinesis Streams respecting the STREAM_INITIAL_POSITION 
configuration. However, we find that it currently reads from TRIM_HORIZON.

This is surprising behavior and should be corrected.
h3. Why

Principle of Least Astonishment
h3. How

We recommend that we reconstruct the previously seen streams by iterating 
through the [sequenceNumsStateForCheckpoint in 
FlinkKinesisConsumer#initializeState()|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L454].
h3. Risks

This might increase the state restore time. We can consider adding a feature 
flag for users to turn this check off.

  was:
h3. What

The FlinkKinesisConsumer allows users to read from [multiple Kinesis 
Streams|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L224].
 

Users can also specify a STREAM_INITIAL_POSITION, which configures if the 
consumer starts reading the stream from TRIM_HORIZON / LATEST / AT_TIMESTAMP.

When restoring the Kinesis Consumer from an existing snapshot, users can 
configure the consumer to read from additional Kinesis Streams. The expected 
behavior would be for the FlinkKinesisConsumer to start reading from the 
additional Kinesis Streams respecting the STREAM_INITIAL_POSITION 
configuration. However, we find that it currently reads from TRIM_HORIZON.

This is surprising behavior and should be corrected.
h3. Why

Principle of Least Astonishment
h3. How

We recommend that we reconstruct the previously seen streams by iterating 
through the [sequenceNumsStateForCheckpoint in 
FlinkKinesisConsumer#initializeState()|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L454].

 


> FlinkKinesisConsumer does not respect StreamInitialPosition for new Kinesis 
> Stream when restoring from snapshot
> ---
>
> Key: FLINK-35299
> URL: https://issues.apache.org/jira/browse/FLINK-35299
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: aws-connector-4.2.0
>Reporter: Hong Liang Teoh
>Priority: Major
> Fix For: aws-connector-4.4.0
>
>
> h3. What
> The FlinkKinesisConsumer allows users to read from [multiple Kinesis 
> Streams|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L224].
> Users can also specify a STREAM_INITIAL_POSITION, which configures if the 
> consumer starts reading the stream from TRIM_HORIZON / LATEST / AT_TIMESTAMP.
> When restoring the Kinesis Consumer from an existing snapshot, users can 
> configure the consumer to read from additional Kinesis Streams. The expected 
> behavior would be for the FlinkKinesisConsumer to start reading from the 
> additional Kinesis Streams respecting the STREAM_INITIAL_POSITION 
> configuration. However, we find that it currently reads from TRIM_HORIZON.
> This is surprising behavior and should be corrected.
> h3. Why
> Principle of Least Astonishment
> h3. How
> We recommend that we reconstruct the previously seen streams by iterating 
> through the [sequenceNumsStateForCheckpoint in 
> FlinkKinesisConsumer#initializeState()|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L454].
> h3. Risks
> This might increase the state restore time. We can consider adding a feature 
> flag for users to turn this check off.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35299) FlinkKinesisConsumer does not respect StreamInitialPosition for new Kinesis Stream when restoring from snapshot

2024-05-06 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-35299:
---

 Summary: FlinkKinesisConsumer does not respect 
StreamInitialPosition for new Kinesis Stream when restoring from snapshot
 Key: FLINK-35299
 URL: https://issues.apache.org/jira/browse/FLINK-35299
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: aws-connector-4.2.0
Reporter: Hong Liang Teoh
 Fix For: aws-connector-4.4.0


h3. What

The FlinkKinesisConsumer allows users to read from [multiple Kinesis 
Streams|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L224].
 

Users can also specify a STREAM_INITIAL_POSITION, which configures if the 
consumer starts reading the stream from TRIM_HORIZON / LATEST / AT_TIMESTAMP.

When restoring the Kinesis Consumer from an existing snapshot, users can 
configure the consumer to read from additional Kinesis Streams. The expected 
behavior would be for the FlinkKinesisConsumer to start reading from the 
additional Kinesis Streams respecting the STREAM_INITIAL_POSITION 
configuration. However, we find that it currently reads from TRIM_HORIZON.

This is surprising behavior and should be corrected.
h3. Why

Principle of Least Astonishment
h3. How

We recommend that we reconstruct the previously seen streams by iterating 
through the [sequenceNumsStateForCheckpoint in 
FlinkKinesisConsumer#initializeState()|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L454].

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35069) ContinuousProcessingTimeTrigger continuously registers timers in a loop at the end of the window

2024-04-19 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838962#comment-17838962
 ] 

Hong Liang Teoh commented on FLINK-35069:
-

Thanks [~lijinzhong] for the fix! Are we planning to backport this fix to 1.18 
and 1.19 as well?

> ContinuousProcessingTimeTrigger continuously registers timers in a loop at 
> the end of the window
> 
>
> Key: FLINK-35069
> URL: https://issues.apache.org/jira/browse/FLINK-35069
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.17.0
>Reporter: Jinzhong Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-04-09-20-23-54-415.png
>
>
> In our production environment,  when TumblingEventTimeWindows and 
> ContinuousProcessingTimeTrigger are used in combination within a 
> WindowOperator, we observe a situation where the timers are continuously 
> registered in a loop at the end of the window, leading to the job being 
> perpetually stuck in timer processing.
> !image-2024-04-09-20-23-54-415.png|width=516,height=205!
> This issue can be reproduced using the 
> [UT|https://github.com/apache/flink/blob/8e80ff889701ed1abbb5c15cd3943b254f1fb5cc/flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/ContinuousProcessingTimeTriggerTest.java#L177]
>  provided by the pr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35134) Release flink-connector-elasticsearch vX.X.X for Flink 1.19

2024-04-17 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-35134:
---

Assignee: Hong Liang Teoh

> Release flink-connector-elasticsearch vX.X.X for Flink 1.19
> ---
>
> Key: FLINK-35134
> URL: https://issues.apache.org/jira/browse/FLINK-35134
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / ElasticSearch
>Reporter: Danny Cranmer
>Assignee: Hong Liang Teoh
>Priority: Major
>
> https://github.com/apache/flink-connector-elasticsearch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35115) Kinesis connector writes wrong Kinesis sequence number at stop with savepoint

2024-04-15 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837319#comment-17837319
 ] 

Hong Liang Teoh commented on FLINK-35115:
-

[~dannycranmer]  That is true, unless it is a problem with 2PC on the sink. The 
main difference for stop-with-savepoint is on the side effects executed on the 
sink.

Let's try and replicate this first!

> Kinesis connector writes wrong Kinesis sequence number at stop with savepoint
> -
>
> Key: FLINK-35115
> URL: https://issues.apache.org/jira/browse/FLINK-35115
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.16.3, 1.18.1
> Environment: The issue happens in a *Kinesis -> Flink -> Kafka* 
> exactly-once setup with:
>  * Flink versions checked 1.16.3 and 1.18.1
>  * Kinesis connector checked 1.16.3 and 4.2.0-1.18
>  * checkpointing configured at 1 minute with EXACTLY_ONCE mode: 
> {code:java}
> StreamExecutionEnvironment execEnv = 
> StreamExecutionEnvironment.getExecutionEnvironment (); 
> execEnv.enableCheckpointing (6,EXACTLY_ONCE); execEnv.getCheckpointConfig 
> ().setCheckpointTimeout (9); execEnv.getCheckpointConfig 
> ().setCheckpointStorage (CHECKPOINTS_PATH); {code}
>  * Kafka sink configured with EXACTLY_ONCE semantic/delivery guarantee:
> {code:java}
> Properties sinkConfig = new Properties ();
> sinkConfig.put ("transaction.timeout.ms", 48);
> KafkaSink sink = KafkaSink.builder ()
> .setBootstrapServers ("localhost:9092")
> .setTransactionalIdPrefix ("test-prefix")
> .setDeliverGuarantee (EXACTLY_ONCE)
> .setKafkaProducerConfig (sinkConfig)
> .setRecordSerializer (
> (KafkaRecordSerializationSchema) (element, context, 
> timestamp) -> new ProducerRecord<> (
> "test-output-topic", null, element.getBytes ()))
> .build (); {code}
>  * Kinesis consumer defined as: 
> {code:java}
> FlinkKinesisConsumer flinkKinesisConsumer = new
> FlinkKinesisConsumer<> ("test-stream",
> new AbstractDeserializationSchema<> () {
> @Override
> public ByteBuffer deserialize (byte[] bytes) {
> // Return
> return ByteBuffer.wrap (bytes);
> }
> }, props); {code}
>  
>Reporter: Vadim Vararu
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: kinesis
>
> Having an exactly-once Kinesis -> Flink -> Kafka job and triggering a 
> stop-with-savepoint, Flink duplicates in Kafka all the records between the 
> last checkpoint and the savepoint at resume:
>  * Event1 is written to Kinesis
>  * Event1 is processed by Flink 
>  * Event1 is committed to Kafka at the checkpoint
>  * 
> 
>  * Event2 is written to Kinesis
>  * Event2 is processed by Flink
>  * Stop with savepoint is triggered manually
>  * Event2 is committed to Kafka
>  * 
> 
>  * Job is resumed from the savepoint
>  * *{color:#FF}Event2 is written again to Kafka at the first 
> checkpoint{color}*
>  
> {color:#172b4d}I believe that it's a Kinesis connector issue for 2 
> reasons:{color}
>  * I've checked the actual Kinesis sequence number in the _metadata file 
> generated at stop-with-savepoint and it's the one from the checkpoint before 
> the savepoint  instead of being the one of the last record committed to Kafka.
>  * I've tested exactly the save job with Kafka as source instead of Kinesis 
> as source and the behaviour does not reproduce.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35115) Kinesis connector writes wrong Kinesis sequence number at stop with savepoint

2024-04-15 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837315#comment-17837315
 ] 

Hong Liang Teoh commented on FLINK-35115:
-

[~a.pilipenko]  Assigned to you, as you mentioned you are looking into it

> Kinesis connector writes wrong Kinesis sequence number at stop with savepoint
> -
>
> Key: FLINK-35115
> URL: https://issues.apache.org/jira/browse/FLINK-35115
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.16.3, 1.18.1
> Environment: The issue happens in a *Kinesis -> Flink -> Kafka* 
> exactly-once setup with:
>  * Flink versions checked 1.16.3 and 1.18.1
>  * Kinesis connector checked 1.16.3 and 4.2.0-1.18
>  * checkpointing configured at 1 minute with EXACTLY_ONCE mode: 
> {code:java}
> StreamExecutionEnvironment execEnv = 
> StreamExecutionEnvironment.getExecutionEnvironment (); 
> execEnv.enableCheckpointing (6,EXACTLY_ONCE); execEnv.getCheckpointConfig 
> ().setCheckpointTimeout (9); execEnv.getCheckpointConfig 
> ().setCheckpointStorage (CHECKPOINTS_PATH); {code}
>  * Kafka sink configured with EXACTLY_ONCE semantic/delivery guarantee:
> {code:java}
> Properties sinkConfig = new Properties ();
> sinkConfig.put ("transaction.timeout.ms", 48);
> KafkaSink sink = KafkaSink.builder ()
> .setBootstrapServers ("localhost:9092")
> .setTransactionalIdPrefix ("test-prefix")
> .setDeliverGuarantee (EXACTLY_ONCE)
> .setKafkaProducerConfig (sinkConfig)
> .setRecordSerializer (
> (KafkaRecordSerializationSchema) (element, context, 
> timestamp) -> new ProducerRecord<> (
> "test-output-topic", null, element.getBytes ()))
> .build (); {code}
>  * Kinesis consumer defined as: 
> {code:java}
> FlinkKinesisConsumer flinkKinesisConsumer = new
> FlinkKinesisConsumer<> ("test-stream",
> new AbstractDeserializationSchema<> () {
> @Override
> public ByteBuffer deserialize (byte[] bytes) {
> // Return
> return ByteBuffer.wrap (bytes);
> }
> }, props); {code}
>  
>Reporter: Vadim Vararu
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: kinesis
>
> Having an exactly-once Kinesis -> Flink -> Kafka job and triggering a 
> stop-with-savepoint, Flink duplicates in Kafka all the records between the 
> last checkpoint and the savepoint at resume:
>  * Event1 is written to Kinesis
>  * Event1 is processed by Flink 
>  * Event1 is committed to Kafka at the checkpoint
>  * 
> 
>  * Event2 is written to Kinesis
>  * Event2 is processed by Flink
>  * Stop with savepoint is triggered manually
>  * Event2 is committed to Kafka
>  * 
> 
>  * Job is resumed from the savepoint
>  * *{color:#FF}Event2 is written again to Kafka at the first 
> checkpoint{color}*
>  
> {color:#172b4d}I believe that it's a Kinesis connector issue for 2 
> reasons:{color}
>  * I've checked the actual Kinesis sequence number in the _metadata file 
> generated at stop-with-savepoint and it's the one from the checkpoint before 
> the savepoint  instead of being the one of the last record committed to Kafka.
>  * I've tested exactly the save job with Kafka as source instead of Kinesis 
> as source and the behaviour does not reproduce.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35115) Kinesis connector writes wrong Kinesis sequence number at stop with savepoint

2024-04-15 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-35115:
---

Assignee: Aleksandr Pilipenko

> Kinesis connector writes wrong Kinesis sequence number at stop with savepoint
> -
>
> Key: FLINK-35115
> URL: https://issues.apache.org/jira/browse/FLINK-35115
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.16.3, 1.18.1
> Environment: The issue happens in a *Kinesis -> Flink -> Kafka* 
> exactly-once setup with:
>  * Flink versions checked 1.16.3 and 1.18.1
>  * Kinesis connector checked 1.16.3 and 4.2.0-1.18
>  * checkpointing configured at 1 minute with EXACTLY_ONCE mode: 
> {code:java}
> StreamExecutionEnvironment execEnv = 
> StreamExecutionEnvironment.getExecutionEnvironment (); 
> execEnv.enableCheckpointing (6,EXACTLY_ONCE); execEnv.getCheckpointConfig 
> ().setCheckpointTimeout (9); execEnv.getCheckpointConfig 
> ().setCheckpointStorage (CHECKPOINTS_PATH); {code}
>  * Kafka sink configured with EXACTLY_ONCE semantic/delivery guarantee:
> {code:java}
> Properties sinkConfig = new Properties ();
> sinkConfig.put ("transaction.timeout.ms", 48);
> KafkaSink sink = KafkaSink.builder ()
> .setBootstrapServers ("localhost:9092")
> .setTransactionalIdPrefix ("test-prefix")
> .setDeliverGuarantee (EXACTLY_ONCE)
> .setKafkaProducerConfig (sinkConfig)
> .setRecordSerializer (
> (KafkaRecordSerializationSchema) (element, context, 
> timestamp) -> new ProducerRecord<> (
> "test-output-topic", null, element.getBytes ()))
> .build (); {code}
>  * Kinesis consumer defined as: 
> {code:java}
> FlinkKinesisConsumer flinkKinesisConsumer = new
> FlinkKinesisConsumer<> ("test-stream",
> new AbstractDeserializationSchema<> () {
> @Override
> public ByteBuffer deserialize (byte[] bytes) {
> // Return
> return ByteBuffer.wrap (bytes);
> }
> }, props); {code}
>  
>Reporter: Vadim Vararu
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: kinesis
>
> Having an exactly-once Kinesis -> Flink -> Kafka job and triggering a 
> stop-with-savepoint, Flink duplicates in Kafka all the records between the 
> last checkpoint and the savepoint at resume:
>  * Event1 is written to Kinesis
>  * Event1 is processed by Flink 
>  * Event1 is committed to Kafka at the checkpoint
>  * 
> 
>  * Event2 is written to Kinesis
>  * Event2 is processed by Flink
>  * Stop with savepoint is triggered manually
>  * Event2 is committed to Kafka
>  * 
> 
>  * Job is resumed from the savepoint
>  * *{color:#FF}Event2 is written again to Kafka at the first 
> checkpoint{color}*
>  
> {color:#172b4d}I believe that it's a Kinesis connector issue for 2 
> reasons:{color}
>  * I've checked the actual Kinesis sequence number in the _metadata file 
> generated at stop-with-savepoint and it's the one from the checkpoint before 
> the savepoint  instead of being the one of the last record committed to Kafka.
>  * I've tested exactly the save job with Kafka as source instead of Kinesis 
> as source and the behaviour does not reproduce.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34025) Show data skew score on Flink Dashboard

2024-04-05 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-34025.
-
Resolution: Done

> Show data skew score on Flink Dashboard
> ---
>
> Key: FLINK-34025
> URL: https://issues.apache.org/jira/browse/FLINK-34025
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Web Frontend
>Affects Versions: 1.20.0
>Reporter: Emre Kartoglu
>Assignee: Emre Kartoglu
>Priority: Major
>  Labels: dashboard, pull-request-available
> Attachments: skew_proposal.png, skew_tab.png
>
>
> *Problem:* Currently users have to click on every operator and check how much 
> data each subtask is processing to see if there is data skew. This is 
> particularly cumbersome and error-prone for jobs with big job graphs. Data 
> skew is an important metric that should be more visible.
>  
> *Proposed solution:*
>  * Show a data skew score on each operator (see screenshot below). This would 
> be an improvement, but would not be sufficient. As it would still not be easy 
> to see the data skew score for jobs with very large job graphs (it'd require 
> a lot of zooming in/out).
>  * Show data skew score for each operator under a new "Data Skew" tab next to 
> the Exceptions tab. See screenshot below  
> !skew_tab.png|width=1226,height=719! .
>  
> !skew_proposal.png|width=845,height=253!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34025) Show data skew score on Flink Dashboard

2024-04-05 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834339#comment-17834339
 ] 

Hong Liang Teoh commented on FLINK-34025:
-

 merged commit 
[{{080efb9}}|https://github.com/apache/flink/commit/080efb9c410102a5d12d31bb2af5a3faa3391736]
 into   apache:master

> Show data skew score on Flink Dashboard
> ---
>
> Key: FLINK-34025
> URL: https://issues.apache.org/jira/browse/FLINK-34025
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Web Frontend
>Affects Versions: 1.20.0
>Reporter: Emre Kartoglu
>Assignee: Emre Kartoglu
>Priority: Major
>  Labels: dashboard, pull-request-available
> Attachments: skew_proposal.png, skew_tab.png
>
>
> *Problem:* Currently users have to click on every operator and check how much 
> data each subtask is processing to see if there is data skew. This is 
> particularly cumbersome and error-prone for jobs with big job graphs. Data 
> skew is an important metric that should be more visible.
>  
> *Proposed solution:*
>  * Show a data skew score on each operator (see screenshot below). This would 
> be an improvement, but would not be sufficient. As it would still not be easy 
> to see the data skew score for jobs with very large job graphs (it'd require 
> a lot of zooming in/out).
>  * Show data skew score for each operator under a new "Data Skew" tab next to 
> the Exceptions tab. See screenshot below  
> !skew_tab.png|width=1226,height=719! .
>  
> !skew_proposal.png|width=845,height=253!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34025) Show data skew score on Flink Dashboard

2024-04-05 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-34025:
---

Assignee: Emre Kartoglu

> Show data skew score on Flink Dashboard
> ---
>
> Key: FLINK-34025
> URL: https://issues.apache.org/jira/browse/FLINK-34025
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Web Frontend
>Affects Versions: 1.20.0
>Reporter: Emre Kartoglu
>Assignee: Emre Kartoglu
>Priority: Major
>  Labels: dashboard, pull-request-available
> Attachments: skew_proposal.png, skew_tab.png
>
>
> *Problem:* Currently users have to click on every operator and check how much 
> data each subtask is processing to see if there is data skew. This is 
> particularly cumbersome and error-prone for jobs with big job graphs. Data 
> skew is an important metric that should be more visible.
>  
> *Proposed solution:*
>  * Show a data skew score on each operator (see screenshot below). This would 
> be an improvement, but would not be sufficient. As it would still not be easy 
> to see the data skew score for jobs with very large job graphs (it'd require 
> a lot of zooming in/out).
>  * Show data skew score for each operator under a new "Data Skew" tab next to 
> the Exceptions tab. See screenshot below  
> !skew_tab.png|width=1226,height=719! .
>  
> !skew_proposal.png|width=845,height=253!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32964) KinesisStreamsSink cant renew credentials with WebIdentityTokenFileCredentialsProvider

2024-03-06 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-32964:

Fix Version/s: aws-connector-4.3.0

> KinesisStreamsSink cant renew credentials with 
> WebIdentityTokenFileCredentialsProvider
> --
>
> Key: FLINK-32964
> URL: https://issues.apache.org/jira/browse/FLINK-32964
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.15.4, 1.16.2, 1.17.1
>Reporter: PhilippeB
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.3.0
>
>
> (First time filling a ticket in Flink community, please let me know if there 
> are any guidelinges I need to follow)
> I noticed a very strange behavior with the Kinesis Sink. I actually using 
> Flink in containerized and Application (reactive) mode on EKS with high 
> availability on S3. 
> Kinesis is configured with IAM role and appropried policies. 
> {code:java}
> //Here a part of my flink-config.yaml:
> parallelism.default: 2
> scheduler-mode: reactive
> execution.checkpointing.interval: 10s
> env.java.opts.jobmanager: -Dkubernetes.max.concurrent.requests=200
> containerized.master.env.KUBERNETES_MAX_CONCURRENT_REQUESTS: 200
> aws.credentials.provider: WEB_IDENTITY_TOKEN
> aws.credentials.role.arn: role
> aws.credentials.role.sessionName: session
> aws.credentials.webIdentityToken.file: 
> /var/run/secrets/eks.amazonaws.com/serviceaccount/token {code}
> When my project is deployed the application and cluster are working well but 
> when the project has been started for about an hour, I suppose the IAM roles 
> session need to be renew, then the job become to crashing continuously.
> {code:java}
> 2023-08-24 10:35:55
> java.lang.IllegalStateException: Connection pool shut down
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.util.Asserts.check(Asserts.java:34)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:254)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:104)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:231)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:228)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:63)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:77)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:56)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
>     at 
> 

[jira] [Assigned] (FLINK-32964) KinesisStreamsSink cant renew credentials with WebIdentityTokenFileCredentialsProvider

2024-03-06 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-32964:
---

Assignee: Aleksandr Pilipenko

> KinesisStreamsSink cant renew credentials with 
> WebIdentityTokenFileCredentialsProvider
> --
>
> Key: FLINK-32964
> URL: https://issues.apache.org/jira/browse/FLINK-32964
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.15.4, 1.16.2, 1.17.1
>Reporter: PhilippeB
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> (First time filling a ticket in Flink community, please let me know if there 
> are any guidelinges I need to follow)
> I noticed a very strange behavior with the Kinesis Sink. I actually using 
> Flink in containerized and Application (reactive) mode on EKS with high 
> availability on S3. 
> Kinesis is configured with IAM role and appropried policies. 
> {code:java}
> //Here a part of my flink-config.yaml:
> parallelism.default: 2
> scheduler-mode: reactive
> execution.checkpointing.interval: 10s
> env.java.opts.jobmanager: -Dkubernetes.max.concurrent.requests=200
> containerized.master.env.KUBERNETES_MAX_CONCURRENT_REQUESTS: 200
> aws.credentials.provider: WEB_IDENTITY_TOKEN
> aws.credentials.role.arn: role
> aws.credentials.role.sessionName: session
> aws.credentials.webIdentityToken.file: 
> /var/run/secrets/eks.amazonaws.com/serviceaccount/token {code}
> When my project is deployed the application and cluster are working well but 
> when the project has been started for about an hour, I suppose the IAM roles 
> session need to be renew, then the job become to crashing continuously.
> {code:java}
> 2023-08-24 10:35:55
> java.lang.IllegalStateException: Connection pool shut down
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.util.Asserts.check(Asserts.java:34)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:254)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:104)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:231)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:228)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:63)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:77)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:56)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
>     at 
> 

[jira] [Resolved] (FLINK-32964) KinesisStreamsSink cant renew credentials with WebIdentityTokenFileCredentialsProvider

2024-03-06 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-32964.
-
Resolution: Fixed

> KinesisStreamsSink cant renew credentials with 
> WebIdentityTokenFileCredentialsProvider
> --
>
> Key: FLINK-32964
> URL: https://issues.apache.org/jira/browse/FLINK-32964
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.15.4, 1.16.2, 1.17.1
>Reporter: PhilippeB
>Priority: Major
>  Labels: pull-request-available
>
> (First time filling a ticket in Flink community, please let me know if there 
> are any guidelinges I need to follow)
> I noticed a very strange behavior with the Kinesis Sink. I actually using 
> Flink in containerized and Application (reactive) mode on EKS with high 
> availability on S3. 
> Kinesis is configured with IAM role and appropried policies. 
> {code:java}
> //Here a part of my flink-config.yaml:
> parallelism.default: 2
> scheduler-mode: reactive
> execution.checkpointing.interval: 10s
> env.java.opts.jobmanager: -Dkubernetes.max.concurrent.requests=200
> containerized.master.env.KUBERNETES_MAX_CONCURRENT_REQUESTS: 200
> aws.credentials.provider: WEB_IDENTITY_TOKEN
> aws.credentials.role.arn: role
> aws.credentials.role.sessionName: session
> aws.credentials.webIdentityToken.file: 
> /var/run/secrets/eks.amazonaws.com/serviceaccount/token {code}
> When my project is deployed the application and cluster are working well but 
> when the project has been started for about an hour, I suppose the IAM roles 
> session need to be renew, then the job become to crashing continuously.
> {code:java}
> 2023-08-24 10:35:55
> java.lang.IllegalStateException: Connection pool shut down
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.util.Asserts.check(Asserts.java:34)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:254)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:104)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:231)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:228)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:63)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:77)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:56)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
>     at 
> 

[jira] [Commented] (FLINK-32964) KinesisStreamsSink cant renew credentials with WebIdentityTokenFileCredentialsProvider

2024-02-15 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817750#comment-17817750
 ] 

Hong Liang Teoh commented on FLINK-32964:
-

 merged commit 
[{{5e1d76d}}|https://github.com/apache/flink-connector-aws/commit/5e1d76d3d935627cc542fafef4df6c8604a3713d]
 into   apache:main

> KinesisStreamsSink cant renew credentials with 
> WebIdentityTokenFileCredentialsProvider
> --
>
> Key: FLINK-32964
> URL: https://issues.apache.org/jira/browse/FLINK-32964
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.15.4, 1.16.2, 1.17.1
>Reporter: PhilippeB
>Priority: Major
>  Labels: pull-request-available
>
> (First time filling a ticket in Flink community, please let me know if there 
> are any guidelinges I need to follow)
> I noticed a very strange behavior with the Kinesis Sink. I actually using 
> Flink in containerized and Application (reactive) mode on EKS with high 
> availability on S3. 
> Kinesis is configured with IAM role and appropried policies. 
> {code:java}
> //Here a part of my flink-config.yaml:
> parallelism.default: 2
> scheduler-mode: reactive
> execution.checkpointing.interval: 10s
> env.java.opts.jobmanager: -Dkubernetes.max.concurrent.requests=200
> containerized.master.env.KUBERNETES_MAX_CONCURRENT_REQUESTS: 200
> aws.credentials.provider: WEB_IDENTITY_TOKEN
> aws.credentials.role.arn: role
> aws.credentials.role.sessionName: session
> aws.credentials.webIdentityToken.file: 
> /var/run/secrets/eks.amazonaws.com/serviceaccount/token {code}
> When my project is deployed the application and cluster are working well but 
> when the project has been started for about an hour, I suppose the IAM roles 
> session need to be renew, then the job become to crashing continuously.
> {code:java}
> 2023-08-24 10:35:55
> java.lang.IllegalStateException: Connection pool shut down
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.util.Asserts.check(Asserts.java:34)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>     at 
> org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:254)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:104)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:231)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:228)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:63)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:77)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:56)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39)
>     at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
>     at 
> 

[jira] [Resolved] (FLINK-34260) Update flink-connector-aws to be compatible with updated SinkV2 interfaces

2024-02-10 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-34260.
-
Resolution: Fixed

> Update flink-connector-aws to be compatible with updated SinkV2 interfaces
> --
>
> Key: FLINK-34260
> URL: https://issues.apache.org/jira/browse/FLINK-34260
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.3.0
>Reporter: Martijn Visser
>Assignee: Aleksandr Pilipenko
>Priority: Blocker
>  Labels: pull-request-available
>
> https://github.com/apache/flink-connector-aws/actions/runs/7689300085/job/20951547366#step:9:798
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-dynamodb: Compilation failure
> Error:  
> /home/runner/work/flink-connector-aws/flink-connector-aws/flink-connector-aws/flink-connector-dynamodb/src/test/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriterTest.java:[357,40]
>  incompatible types: 
> org.apache.flink.connector.base.sink.writer.TestSinkInitContext cannot be 
> converted to org.apache.flink.api.connector.sink2.Sink.InitContext
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34260) Update flink-connector-aws to be compatible with updated SinkV2 interfaces

2024-02-10 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816317#comment-17816317
 ] 

Hong Liang Teoh commented on FLINK-34260:
-

 merged commit 
[{{5b6f087}}|https://github.com/apache/flink-connector-aws/commit/5b6f087815bcf18cf62ba39b2ac1f84f5e72f951]
 into   apache:main

> Update flink-connector-aws to be compatible with updated SinkV2 interfaces
> --
>
> Key: FLINK-34260
> URL: https://issues.apache.org/jira/browse/FLINK-34260
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.3.0
>Reporter: Martijn Visser
>Assignee: Aleksandr Pilipenko
>Priority: Blocker
>  Labels: pull-request-available
>
> https://github.com/apache/flink-connector-aws/actions/runs/7689300085/job/20951547366#step:9:798
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-dynamodb: Compilation failure
> Error:  
> /home/runner/work/flink-connector-aws/flink-connector-aws/flink-connector-aws/flink-connector-dynamodb/src/test/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriterTest.java:[357,40]
>  incompatible types: 
> org.apache.flink.connector.base.sink.writer.TestSinkInitContext cannot be 
> converted to org.apache.flink.api.connector.sink2.Sink.InitContext
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34407) Flaky tests causing workflow timeout

2024-02-10 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816313#comment-17816313
 ] 

Hong Liang Teoh commented on FLINK-34407:
-

merged commit 
[{{b207606}}|https://github.com/apache/flink-connector-aws/commit/b207606a95d0ce508c55e69dd0dc6c598eb2fb3c]
 into   apache:main

> Flaky tests causing workflow timeout
> 
>
> Key: FLINK-34407
> URL: https://issues.apache.org/jira/browse/FLINK-34407
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: aws-connector-4.2.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> Example build: 
> [https://github.com/apache/flink-connector-aws/actions/runs/7735404733]
> Tests are stuck retrying due to the following exception:
> {code:java}
> 797445 [main] WARN  
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutRecordPublisher
>  [] - Encountered recoverable error TimeoutException. Backing off for 0 
> millis 00 (arn)
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutShardSubscriber$RecoverableFanOutSubscriberException:
>  java.util.concurrent.TimeoutException: Timed out acquiring subscription - 
> 00 (arn)
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutShardSubscriber.handleErrorAndRethrow(FanOutShardSubscriber.java:327)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutShardSubscriber.openSubscriptionToShard(FanOutShardSubscriber.java:283)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutShardSubscriber.subscribeToShardAndConsumeRecords(FanOutShardSubscriber.java:210)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutRecordPublisher.runWithBackoff(FanOutRecordPublisher.java:177)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutRecordPublisher.run(FanOutRecordPublisher.java:130)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutRecordPublisherTest.testCancelExitsGracefully(FanOutRecordPublisherTest.java:595)
>  ~[test-classes/:?]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_402]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34407) Flaky tests causing workflow timeout

2024-02-10 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-34407.
-
Resolution: Fixed

> Flaky tests causing workflow timeout
> 
>
> Key: FLINK-34407
> URL: https://issues.apache.org/jira/browse/FLINK-34407
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: aws-connector-4.2.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> Example build: 
> [https://github.com/apache/flink-connector-aws/actions/runs/7735404733]
> Tests are stuck retrying due to the following exception:
> {code:java}
> 797445 [main] WARN  
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutRecordPublisher
>  [] - Encountered recoverable error TimeoutException. Backing off for 0 
> millis 00 (arn)
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutShardSubscriber$RecoverableFanOutSubscriberException:
>  java.util.concurrent.TimeoutException: Timed out acquiring subscription - 
> 00 (arn)
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutShardSubscriber.handleErrorAndRethrow(FanOutShardSubscriber.java:327)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutShardSubscriber.openSubscriptionToShard(FanOutShardSubscriber.java:283)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutShardSubscriber.subscribeToShardAndConsumeRecords(FanOutShardSubscriber.java:210)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutRecordPublisher.runWithBackoff(FanOutRecordPublisher.java:177)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutRecordPublisher.run(FanOutRecordPublisher.java:130)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.FanOutRecordPublisherTest.testCancelExitsGracefully(FanOutRecordPublisherTest.java:595)
>  ~[test-classes/:?]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_402]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34071) Deadlock in AWS Kinesis Data Streams AsyncSink connector

2024-01-15 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806733#comment-17806733
 ] 

Hong Liang Teoh commented on FLINK-34071:
-

As discussed offline, wonder if we can add a timeout to handle this. Maybe we 
can consider following a similar pattern to Async I/O.

 

[https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#timeout-handling]

 

This will need a FLIP if we introduce a new configuration. 

 

 

> Deadlock in AWS Kinesis Data Streams AsyncSink connector
> 
>
> Key: FLINK-34071
> URL: https://issues.apache.org/jira/browse/FLINK-34071
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS, Connectors / Kinesis
>Affects Versions: aws-connector-3.0.0, 1.15.4, aws-connector-4.2.0
>Reporter: Aleksandr Pilipenko
>Priority: Major
>
> Sink operator hangs while flushing records, similarly to FLINK-32230. Error 
> observed even when using AWS SDK version that contains fix for async client 
> error handling [https://github.com/aws/aws-sdk-java-v2/pull/4402]
> Thread dump of stuck thread:
> {code:java}
> "sdk-async-response-1-6236" Id=11213 RUNNABLE
> at 
> app//org.apache.flink.connector.base.sink.writer.AsyncSinkWriter.lambda$flush$5(AsyncSinkWriter.java:385)
> at 
> app//org.apache.flink.connector.base.sink.writer.AsyncSinkWriter$$Lambda$1778/0x000801141040.accept(Unknown
>  Source)
> at 
> org.apache.flink.connector.kinesis.sink.KinesisStreamsSinkWriter.handleFullyFailedRequest(KinesisStreamsSinkWriter.java:210)
> at 
> org.apache.flink.connector.kinesis.sink.KinesisStreamsSinkWriter.lambda$submitRequestEntries$1(KinesisStreamsSinkWriter.java:184)
> at 
> org.apache.flink.connector.kinesis.sink.KinesisStreamsSinkWriter$$Lambda$1965/0x0008011a0c40.accept(Unknown
>  Source)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79)
> at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.utils.CompletableFutureUtils$$Lambda$1925/0x000801181840.accept(Unknown
>  Source)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallMetricCollectionStage.lambda$execute$0(AsyncApiCallMetricCollectionStage.java:56)
> at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallMetricCollectionStage$$Lambda$1961/0x000801191c40.accept(Unknown
>  Source)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallTimeoutTrackingStage.lambda$execute$2(AsyncApiCallTimeoutTrackingStage.java:67)
> at 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallTimeoutTrackingStage$$Lambda$1960/0x000801191840.accept(Unknown
>  Source)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base@11.0.18/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at 
> 

[jira] [Assigned] (FLINK-33995) Add test in test_file_sink.sh s3 StreamingFileSink for csv

2024-01-04 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-33995:
---

Assignee: Samrat Deb

> Add test in test_file_sink.sh s3 StreamingFileSink for csv 
> ---
>
> Key: FLINK-33995
> URL: https://issues.apache.org/jira/browse/FLINK-33995
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Reporter: Samrat Deb
>Assignee: Samrat Deb
>Priority: Major
>
> test_file_sink.sh s3 StreamingFileSink doesnt have coverage for csv format . 
> this task will add new test case to cover when format is `csv`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33872) Checkpoint history does not display for completed jobs

2023-12-19 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-33872.
-
Resolution: Fixed

> Checkpoint history does not display for completed jobs
> --
>
> Key: FLINK-33872
> URL: https://issues.apache.org/jira/browse/FLINK-33872
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.18.0
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1
>
> Attachments: image-2023-12-18-11-37-11-914.png, 
> image-2023-12-18-11-37-29-596.png
>
>
> Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
> history for completed jobs (CANCELED, FAILED, FINISHED).
> After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint 
> history does not show up for completed jobs. 
> *Reproduction steps:*
>  # Start a Flink cluster.
>  # Submit a job with checkpointing enabled.
>  # Wait until at least 1 checkpoint completes.
>  # Cancel job.
>  # Open the Flink dashboard > Job > Checkpoints > History.
> We will see log line in JobManager saying "FlinkJobNotFoundException: Could 
> not find Flink job (  )"
> *Snapshot of failure:*
> When job is running, we can see checkpoints.
> !image-2023-12-18-11-37-11-914.png|width=862,height=295!
> When job has been CANCELLED, we no longer see checkpoints data.
> !image-2023-12-18-11-37-29-596.png|width=860,height=258!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33872) Checkpoint history does not display for completed jobs

2023-12-19 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798635#comment-17798635
 ] 

Hong Liang Teoh commented on FLINK-33872:
-

merged commit 
[{{54fd94a}}|https://github.com/apache/flink/commit/54fd94ab750c6cbb4cfa21a911656235c1bb059b]
 into apache:release-1.18

> Checkpoint history does not display for completed jobs
> --
>
> Key: FLINK-33872
> URL: https://issues.apache.org/jira/browse/FLINK-33872
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.18.0
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1
>
> Attachments: image-2023-12-18-11-37-11-914.png, 
> image-2023-12-18-11-37-29-596.png
>
>
> Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
> history for completed jobs (CANCELED, FAILED, FINISHED).
> After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint 
> history does not show up for completed jobs. 
> *Reproduction steps:*
>  # Start a Flink cluster.
>  # Submit a job with checkpointing enabled.
>  # Wait until at least 1 checkpoint completes.
>  # Cancel job.
>  # Open the Flink dashboard > Job > Checkpoints > History.
> We will see log line in JobManager saying "FlinkJobNotFoundException: Could 
> not find Flink job (  )"
> *Snapshot of failure:*
> When job is running, we can see checkpoints.
> !image-2023-12-18-11-37-11-914.png|width=862,height=295!
> When job has been CANCELLED, we no longer see checkpoints data.
> !image-2023-12-18-11-37-29-596.png|width=860,height=258!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33872) Checkpoint history does not display for completed jobs

2023-12-19 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798634#comment-17798634
 ] 

Hong Liang Teoh commented on FLINK-33872:
-

merged commit 
[{{1e89e2f}}|https://github.com/apache/flink/commit/1e89e2fad33df84e33d6276f8fafd38957cfbd47]
 into apache:master

 

> Checkpoint history does not display for completed jobs
> --
>
> Key: FLINK-33872
> URL: https://issues.apache.org/jira/browse/FLINK-33872
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.18.0
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1
>
> Attachments: image-2023-12-18-11-37-11-914.png, 
> image-2023-12-18-11-37-29-596.png
>
>
> Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
> history for completed jobs (CANCELED, FAILED, FINISHED).
> After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint 
> history does not show up for completed jobs. 
> *Reproduction steps:*
>  # Start a Flink cluster.
>  # Submit a job with checkpointing enabled.
>  # Wait until at least 1 checkpoint completes.
>  # Cancel job.
>  # Open the Flink dashboard > Job > Checkpoints > History.
> We will see log line in JobManager saying "FlinkJobNotFoundException: Could 
> not find Flink job (  )"
> *Snapshot of failure:*
> When job is running, we can see checkpoints.
> !image-2023-12-18-11-37-11-914.png|width=862,height=295!
> When job has been CANCELLED, we no longer see checkpoints data.
> !image-2023-12-18-11-37-29-596.png|width=860,height=258!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33872) Checkpoint history does not display for completed jobs

2023-12-18 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-33872:
---

Assignee: Hong Liang Teoh

> Checkpoint history does not display for completed jobs
> --
>
> Key: FLINK-33872
> URL: https://issues.apache.org/jira/browse/FLINK-33872
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.18.0
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
> Fix For: 1.19.0, 1.18.2
>
> Attachments: image-2023-12-18-11-37-11-914.png, 
> image-2023-12-18-11-37-29-596.png
>
>
> Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
> history for completed jobs (CANCELED, FAILED, FINISHED).
> After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint 
> history does not show up for completed jobs. 
> *Reproduction steps:*
>  # Start a Flink cluster.
>  # Submit a job with checkpointing enabled.
>  # Wait until at least 1 checkpoint completes.
>  # Cancel job.
>  # Open the Flink dashboard > Job > Checkpoints > History.
> We will see log line in JobManager saying "FlinkJobNotFoundException: Could 
> not find Flink job (  )"
> *Snapshot of failure:*
> When job is running, we can see checkpoints.
> !image-2023-12-18-11-37-11-914.png|width=862,height=295!
> When job has been CANCELLED, we no longer see checkpoints data.
> !image-2023-12-18-11-37-29-596.png|width=860,height=258!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33872) Checkpoint history does not display for completed jobs

2023-12-18 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33872:
---

 Summary: Checkpoint history does not display for completed jobs
 Key: FLINK-33872
 URL: https://issues.apache.org/jira/browse/FLINK-33872
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.18.0
Reporter: Hong Liang Teoh
 Fix For: 1.19.0, 1.18.2
 Attachments: image-2023-12-18-11-37-11-914.png, 
image-2023-12-18-11-37-29-596.png

Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
history for completed jobs (CANCELED, FAILED, FINISHED).

After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint history 
does not show up for completed jobs. 

*Reproduction steps:*
 # Start a Flink cluster.
 # Submit a job with checkpointing enabled.
 # Wait until at least 1 checkpoint completes.
 # Cancel job.
 # Open the Flink dashboard > Job > Checkpoints > History.

We will see log line in JobManager saying "FlinkJobNotFoundException: Could not 
find Flink job (  )"

*Snapshot of failure:*

When job is running, we can see checkpoints.

!image-2023-12-18-11-37-11-914.png!

When job has been CANCELLED, we no longer see checkpoints data.

!image-2023-12-18-11-37-29-596.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33872) Checkpoint history does not display for completed jobs

2023-12-18 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-33872:

Description: 
Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
history for completed jobs (CANCELED, FAILED, FINISHED).

After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint history 
does not show up for completed jobs. 

*Reproduction steps:*
 # Start a Flink cluster.
 # Submit a job with checkpointing enabled.
 # Wait until at least 1 checkpoint completes.
 # Cancel job.
 # Open the Flink dashboard > Job > Checkpoints > History.

We will see log line in JobManager saying "FlinkJobNotFoundException: Could not 
find Flink job (  )"

*Snapshot of failure:*

When job is running, we can see checkpoints.

!image-2023-12-18-11-37-11-914.png|width=862,height=295!

When job has been CANCELLED, we no longer see checkpoints data.

!image-2023-12-18-11-37-29-596.png|width=860,height=258!

  was:
Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
history for completed jobs (CANCELED, FAILED, FINISHED).

After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint history 
does not show up for completed jobs. 

*Reproduction steps:*
 # Start a Flink cluster.
 # Submit a job with checkpointing enabled.
 # Wait until at least 1 checkpoint completes.
 # Cancel job.
 # Open the Flink dashboard > Job > Checkpoints > History.

We will see log line in JobManager saying "FlinkJobNotFoundException: Could not 
find Flink job (  )"

*Snapshot of failure:*

When job is running, we can see checkpoints.

!image-2023-12-18-11-37-11-914.png|width=1023,height=350!

When job has been CANCELLED, we no longer see checkpoints data.

!image-2023-12-18-11-37-29-596.png|width=1023,height=307!


> Checkpoint history does not display for completed jobs
> --
>
> Key: FLINK-33872
> URL: https://issues.apache.org/jira/browse/FLINK-33872
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.18.0
>Reporter: Hong Liang Teoh
>Priority: Major
> Fix For: 1.19.0, 1.18.2
>
> Attachments: image-2023-12-18-11-37-11-914.png, 
> image-2023-12-18-11-37-29-596.png
>
>
> Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
> history for completed jobs (CANCELED, FAILED, FINISHED).
> After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint 
> history does not show up for completed jobs. 
> *Reproduction steps:*
>  # Start a Flink cluster.
>  # Submit a job with checkpointing enabled.
>  # Wait until at least 1 checkpoint completes.
>  # Cancel job.
>  # Open the Flink dashboard > Job > Checkpoints > History.
> We will see log line in JobManager saying "FlinkJobNotFoundException: Could 
> not find Flink job (  )"
> *Snapshot of failure:*
> When job is running, we can see checkpoints.
> !image-2023-12-18-11-37-11-914.png|width=862,height=295!
> When job has been CANCELLED, we no longer see checkpoints data.
> !image-2023-12-18-11-37-29-596.png|width=860,height=258!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33872) Checkpoint history does not display for completed jobs

2023-12-18 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-33872:

Description: 
Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
history for completed jobs (CANCELED, FAILED, FINISHED).

After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint history 
does not show up for completed jobs. 

*Reproduction steps:*
 # Start a Flink cluster.
 # Submit a job with checkpointing enabled.
 # Wait until at least 1 checkpoint completes.
 # Cancel job.
 # Open the Flink dashboard > Job > Checkpoints > History.

We will see log line in JobManager saying "FlinkJobNotFoundException: Could not 
find Flink job (  )"

*Snapshot of failure:*

When job is running, we can see checkpoints.

!image-2023-12-18-11-37-11-914.png|width=1023,height=350!

When job has been CANCELLED, we no longer see checkpoints data.

!image-2023-12-18-11-37-29-596.png|width=1023,height=307!

  was:
Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
history for completed jobs (CANCELED, FAILED, FINISHED).

After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint history 
does not show up for completed jobs. 

*Reproduction steps:*
 # Start a Flink cluster.
 # Submit a job with checkpointing enabled.
 # Wait until at least 1 checkpoint completes.
 # Cancel job.
 # Open the Flink dashboard > Job > Checkpoints > History.

We will see log line in JobManager saying "FlinkJobNotFoundException: Could not 
find Flink job (  )"

*Snapshot of failure:*

When job is running, we can see checkpoints.

!image-2023-12-18-11-37-11-914.png!

When job has been CANCELLED, we no longer see checkpoints data.

!image-2023-12-18-11-37-29-596.png!


> Checkpoint history does not display for completed jobs
> --
>
> Key: FLINK-33872
> URL: https://issues.apache.org/jira/browse/FLINK-33872
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.18.0
>Reporter: Hong Liang Teoh
>Priority: Major
> Fix For: 1.19.0, 1.18.2
>
> Attachments: image-2023-12-18-11-37-11-914.png, 
> image-2023-12-18-11-37-29-596.png
>
>
> Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
> history for completed jobs (CANCELED, FAILED, FINISHED).
> After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint 
> history does not show up for completed jobs. 
> *Reproduction steps:*
>  # Start a Flink cluster.
>  # Submit a job with checkpointing enabled.
>  # Wait until at least 1 checkpoint completes.
>  # Cancel job.
>  # Open the Flink dashboard > Job > Checkpoints > History.
> We will see log line in JobManager saying "FlinkJobNotFoundException: Could 
> not find Flink job (  )"
> *Snapshot of failure:*
> When job is running, we can see checkpoints.
> !image-2023-12-18-11-37-11-914.png|width=1023,height=350!
> When job has been CANCELLED, we no longer see checkpoints data.
> !image-2023-12-18-11-37-29-596.png|width=1023,height=307!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-6349) Enforce per-subtask record ordering on resharding for FlinkKinesisConsumer

2023-11-24 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789436#comment-17789436
 ] 

Hong Liang Teoh commented on FLINK-6349:


This will be addressed by the new FLIP-27 based KinesisSource - tracked by 
FLINK-32218.

> Enforce per-subtask record ordering on resharding for FlinkKinesisConsumer
> --
>
> Key: FLINK-6349
> URL: https://issues.apache.org/jira/browse/FLINK-6349
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Tzu-Li (Gordon) Tai
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> As described in FLINK-6316, currently the Kinesis consumer does not provide 
> any ordering guarantees when resharding occurs.
> While this cannot be enforced globally (i.e. if a merged / split shard's 
> child shard ends up in a different subtask, we cannot do any coordination for 
> ordering guarantee), we can definitely enforce this locally for each subtask. 
> Simply put, we can still locally enforce ordering by making sure that 
> discovered child shards are consumed only after any of its parent shards that 
> were on the same subtask are fully consumed.
> To do this, we would also need to add "parent shard" information to 
> {{KinesisStreamShard}} (Flink's representation of Kinesis shards).
> This would be directly beneficial for per-shard watermarks (FLINK-5697) to 
> retain per-shard time characteristics after a reshard, and therefore can be 
> seen as a prerequisite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-6349) Enforce per-subtask record ordering on resharding for FlinkKinesisConsumer

2023-11-24 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-6349:
---
Priority: Major  (was: Not a Priority)

> Enforce per-subtask record ordering on resharding for FlinkKinesisConsumer
> --
>
> Key: FLINK-6349
> URL: https://issues.apache.org/jira/browse/FLINK-6349
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Tzu-Li (Gordon) Tai
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> As described in FLINK-6316, currently the Kinesis consumer does not provide 
> any ordering guarantees when resharding occurs.
> While this cannot be enforced globally (i.e. if a merged / split shard's 
> child shard ends up in a different subtask, we cannot do any coordination for 
> ordering guarantee), we can definitely enforce this locally for each subtask. 
> Simply put, we can still locally enforce ordering by making sure that 
> discovered child shards are consumed only after any of its parent shards that 
> were on the same subtask are fully consumed.
> To do this, we would also need to add "parent shard" information to 
> {{KinesisStreamShard}} (Flink's representation of Kinesis shards).
> This would be directly beneficial for per-shard watermarks (FLINK-5697) to 
> retain per-shard time characteristics after a reshard, and therefore can be 
> seen as a prerequisite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33194) AWS Connector should directly depend on 3rd-party libs instead of flink-shaded repo

2023-10-27 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-33194:
---

Assignee: Jiabao Sun

> AWS Connector should directly depend on 3rd-party libs instead of 
> flink-shaded repo
> ---
>
> Key: FLINK-33194
> URL: https://issues.apache.org/jira/browse/FLINK-33194
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / AWS
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Jiabao Sun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33194) AWS Connector should directly depend on 3rd-party libs instead of flink-shaded repo

2023-10-27 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-33194.
-
Resolution: Fixed

> AWS Connector should directly depend on 3rd-party libs instead of 
> flink-shaded repo
> ---
>
> Key: FLINK-33194
> URL: https://issues.apache.org/jira/browse/FLINK-33194
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / AWS
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Jiabao Sun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33194) AWS Connector should directly depend on 3rd-party libs instead of flink-shaded repo

2023-10-27 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780273#comment-17780273
 ] 

Hong Liang Teoh commented on FLINK-33194:
-

merged commit 
[{{c244bb3}}|https://github.com/apache/flink-connector-aws/commit/c244bb30240399f67557bbaf8e718d5eb158199d]
 into apache:main 

> AWS Connector should directly depend on 3rd-party libs instead of 
> flink-shaded repo
> ---
>
> Key: FLINK-33194
> URL: https://issues.apache.org/jira/browse/FLINK-33194
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / AWS
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32399) Fix flink-sql-connector-kinesis build for maven 3.8.6+

2023-10-12 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774628#comment-17774628
 ] 

Hong Liang Teoh commented on FLINK-32399:
-

Ah, that's a good call. We will need to make some tweaks because we recently 
added an AWS EndToEnd test step if the AWS creds are configured (test 
connectors against real AWS services)

But I've created a Jira for this! 
https://issues.apache.org/jira/browse/FLINK-33259

> Fix flink-sql-connector-kinesis build for maven 3.8.6+
> --
>
> Key: FLINK-32399
> URL: https://issues.apache.org/jira/browse/FLINK-32399
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Kinesis
>Affects Versions: aws-connector-3.0.0, aws-connector-4.0.0, 
> aws-connector-4.1.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.2.0
>
>
> Fix build for sql kinesis connector with maven versions 3.8.6 and above.
> Currently, when using maven newer than 3.8.5, the resulting jar will contain 
> dependencies of flink-connector-kinesis along with relocated versions of 
> these dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33259) flink-connector-aws should use/extend the common connector workflow

2023-10-12 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33259:
---

 Summary: flink-connector-aws should use/extend the common 
connector workflow
 Key: FLINK-33259
 URL: https://issues.apache.org/jira/browse/FLINK-33259
 Project: Flink
  Issue Type: Technical Debt
Reporter: Hong Liang Teoh


We should use the common ci github workflow.
[https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/ci.yml]

 

Example used in flink-connector-elasticsearch

[https://github.com/apache/flink-connector-elasticsearch/blob/main/.github/workflows/push_pr.yml]

 

This improves our operational stance because we will now inherit any 
improvements/changes to the main ci workflow file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32399) Fix flink-sql-connector-kinesis build for maven 3.8.6+

2023-10-12 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-32399:
---

Assignee: Aleksandr Pilipenko

> Fix flink-sql-connector-kinesis build for maven 3.8.6+
> --
>
> Key: FLINK-32399
> URL: https://issues.apache.org/jira/browse/FLINK-32399
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Kinesis
>Affects Versions: aws-connector-3.0.0, aws-connector-4.0.0, 
> aws-connector-4.1.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.2.0
>
>
> Fix build for sql kinesis connector with maven versions 3.8.6 and above.
> Currently, when using maven newer than 3.8.5, the resulting jar will contain 
> dependencies of flink-connector-kinesis along with relocated versions of 
> these dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32399) Fix flink-sql-connector-kinesis build for maven 3.8.6+

2023-10-12 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-32399.
-
Resolution: Fixed

> Fix flink-sql-connector-kinesis build for maven 3.8.6+
> --
>
> Key: FLINK-32399
> URL: https://issues.apache.org/jira/browse/FLINK-32399
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Kinesis
>Affects Versions: aws-connector-3.0.0, aws-connector-4.0.0, 
> aws-connector-4.1.0
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.2.0
>
>
> Fix build for sql kinesis connector with maven versions 3.8.6 and above.
> Currently, when using maven newer than 3.8.5, the resulting jar will contain 
> dependencies of flink-connector-kinesis along with relocated versions of 
> these dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32399) Fix flink-sql-connector-kinesis build for maven 3.8.6+

2023-10-12 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774539#comment-17774539
 ] 

Hong Liang Teoh commented on FLINK-32399:
-

merged commit 
[{{66826b7}}|https://github.com/apache/flink-connector-aws/commit/66826b786dadf163dcca2279dc6b79c1a169521e]
 into apache:main 
[now|https://github.com/apache/flink-connector-aws/pull/87#event-10632082219]

> Fix flink-sql-connector-kinesis build for maven 3.8.6+
> --
>
> Key: FLINK-32399
> URL: https://issues.apache.org/jira/browse/FLINK-32399
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Kinesis
>Affects Versions: aws-connector-3.0.0, aws-connector-4.0.0, 
> aws-connector-4.1.0
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.2.0
>
>
> Fix build for sql kinesis connector with maven versions 3.8.6 and above.
> Currently, when using maven newer than 3.8.5, the resulting jar will contain 
> dependencies of flink-connector-kinesis along with relocated versions of 
> these dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33141) Promentheus Sink Connector - Amazon Managed Prometheus Request Signer

2023-10-09 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-33141.
-
Resolution: Implemented

Resolved by https://issues.apache.org/jira/browse/FLINK-33138

> Promentheus Sink Connector - Amazon Managed Prometheus Request Signer
> -
>
> Key: FLINK-33141
> URL: https://issues.apache.org/jira/browse/FLINK-33141
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / AWS
>Reporter: Lorenzo Nicora
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33138) Prometheus Connector Sink - DataStream API implementation

2023-10-09 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-33138:
---

Assignee: Lorenzo Nicora

> Prometheus Connector Sink - DataStream API implementation
> -
>
> Key: FLINK-33138
> URL: https://issues.apache.org/jira/browse/FLINK-33138
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Prometheus
>Reporter: Lorenzo Nicora
>Assignee: Lorenzo Nicora
>Priority: Major
> Fix For: prometheus-connector-1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33175) Nightly builds from S3 are not available for download, breaking all connector tests

2023-10-09 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773294#comment-17773294
 ] 

Hong Liang Teoh commented on FLINK-33175:
-

Verified the flink-connector-aws one succeeds as well. Thank you for looking 
into this, everyone!

 

https://github.com/apache/flink-connector-aws/actions/runs/6450772599/job/17520032785

> Nightly builds from S3 are not available for download, breaking all connector 
> tests
> ---
>
> Key: FLINK-33175
> URL: https://issues.apache.org/jira/browse/FLINK-33175
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Reporter: Martijn Visser
>Assignee: Jing Ge
>Priority: Blocker
>
> All downloads of Flink binaries fail with:
> {code:java}
> Run wget -q -c 
> https://s3.amazonaws.com/flink-nightly/flink-1.18-SNAPSHOT-bin-scala_2.12.tgz 
> -O - | tar -xz
> gzip: stdin: unexpected end of file
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
> Error: Process completed with exit code 2.
> {code}
> This goes for 1.18, but also 1.17 and 1.16



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33167) Run IT tests against Kinesalite if AWS credentials are not present

2023-09-28 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33167:
---

 Summary: Run IT tests against Kinesalite if AWS credentials are 
not present
 Key: FLINK-33167
 URL: https://issues.apache.org/jira/browse/FLINK-33167
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / AWS, Connectors / Kinesis
Reporter: Hong Liang Teoh


*What*

We want to run Kinesis IT tests against Kinesalite if there are no AWS 
credentials present. 

 

*Why*

We want maximum test coverage (e.g. on PR build, we don't have AWS creds, so we 
run against Kinesalite to eagerly find mistakes in PRs)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33137) FLIP-312: Prometheus Sink Connector

2023-09-25 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-33137:
---

Assignee: Lorenzo Nicora

> FLIP-312: Prometheus Sink Connector
> ---
>
> Key: FLINK-33137
> URL: https://issues.apache.org/jira/browse/FLINK-33137
> Project: Flink
>  Issue Type: New Feature
>Reporter: Lorenzo Nicora
>Assignee: Lorenzo Nicora
>Priority: Major
>  Labels: Connector
>
> Umbrella Jira for implementation of Prometheus Sink Connector
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-312:+Prometheus+Sink+Connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-19 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-32962.
-
Resolution: Fixed

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.2, 1.19.0
>
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[jira] [Commented] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-19 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766736#comment-17766736
 ] 

Hong Liang Teoh commented on FLINK-32962:
-

merged commit 
[{{f786dbe}}|https://github.com/apache/flink/commit/f786dbea0be25cc62dc10311a8d64d9de2ee20dd]
 into apache:release-1.17 

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.2, 1.19.0
>
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> 

[jira] [Assigned] (FLINK-33073) Implement end-to-end tests for the Kinesis Streams Sink

2023-09-11 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-33073:
---

Assignee: Hong Liang Teoh

> Implement end-to-end tests for the Kinesis Streams Sink
> ---
>
> Key: FLINK-33073
> URL: https://issues.apache.org/jira/browse/FLINK-33073
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / AWS
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>
> *What*
> Implement end-to-end tests for KinesisStreamsSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33072) Implement end-to-end tests for AWS Kinesis Connectors

2023-09-11 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-33072:

Description: 
*What*

We want to implement end-to-end tests that target real Kinesis Data Streams.

*Why*

This solidifies our testing to ensure we pick up any integration issues with 
Kinesis Data Streams API.

We especially want to test happy cases and failure cases to ensure those cases 
are handled as expected by the KDS connector.

 

Reference: https://issues.apache.org/jira/browse/INFRA-24474

 

  was:
*What*

We want to implement end-to-end tests that target real Kinesis Data Streams.

*Why*

This solidifies our testing to ensure we pick up any integration issues with 
Kinesis Data Streams API.

We especially want to test happy cases and failure cases to ensure those cases 
are handled as expected by the KDS connector.

 


> Implement end-to-end tests for AWS Kinesis Connectors
> -
>
> Key: FLINK-33072
> URL: https://issues.apache.org/jira/browse/FLINK-33072
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / AWS
>Reporter: Hong Liang Teoh
>Priority: Major
> Fix For: 2.0.0
>
>
> *What*
> We want to implement end-to-end tests that target real Kinesis Data Streams.
> *Why*
> This solidifies our testing to ensure we pick up any integration issues with 
> Kinesis Data Streams API.
> We especially want to test happy cases and failure cases to ensure those 
> cases are handled as expected by the KDS connector.
>  
> Reference: https://issues.apache.org/jira/browse/INFRA-24474
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33073) Implement end-to-end tests for the Kinesis Streams Sink

2023-09-11 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33073:
---

 Summary: Implement end-to-end tests for the Kinesis Streams Sink
 Key: FLINK-33073
 URL: https://issues.apache.org/jira/browse/FLINK-33073
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / AWS
Reporter: Hong Liang Teoh
 Fix For: 2.0.0


*What*

Implement end-to-end tests for KinesisStreamsSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33072) Implement end-to-end tests for AWS Kinesis Connectors

2023-09-11 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33072:
---

 Summary: Implement end-to-end tests for AWS Kinesis Connectors
 Key: FLINK-33072
 URL: https://issues.apache.org/jira/browse/FLINK-33072
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / AWS
Reporter: Hong Liang Teoh
 Fix For: 2.0.0


*What*

We want to implement end-to-end tests that target real Kinesis Data Streams.

*Why*

This solidifies our testing to ensure we pick up any integration issues with 
Kinesis Data Streams API.

We especially want to test happy cases and failure cases to ensure those cases 
are handled as expected by the KDS connector.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-08 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-32962:

Fix Version/s: 1.17.2

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.2, 1.19.0
>
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[jira] [Commented] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-08 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763052#comment-17763052
 ] 

Hong Liang Teoh commented on FLINK-32962:
-

merged commit 
[{{7f99a27}}|https://github.com/apache/flink/commit/7f99a274f9912513479a6eeb8ed82d721a8aeb7f]
 into apache:release-1.18

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> 

[jira] [Updated] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-08 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-32962:

Fix Version/s: 1.18.0
   1.19.0

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.19.0
>
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[jira] [Commented] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-07 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762807#comment-17762807
 ] 

Hong Liang Teoh commented on FLINK-32962:
-

[~a.pilipenko]  should we also backport this to Flink 1.17 and 1.18?

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[jira] [Commented] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-07 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762805#comment-17762805
 ] 

Hong Liang Teoh commented on FLINK-32962:
-

merged commit 
[{{1193e17}}|https://github.com/apache/flink/commit/1193e178f6c478bf493231fc53c74f03158c0ca9]
 into apache:master

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[jira] [Commented] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-05 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762042#comment-17762042
 ] 

Hong Liang Teoh commented on FLINK-33021:
-

merged commit 
[{{8b0ae0f}}|https://github.com/apache/flink-connector-aws/commit/8b0ae0f45fea40beb52e12e6a25ede6003bef1be]
 into apache:main

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Assignee: Hong Liang Teoh
>Priority: Blocker
> Fix For: aws-connector-4.2.0
>
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-05 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-33021.
-
Resolution: Fixed

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Assignee: Hong Liang Teoh
>Priority: Blocker
> Fix For: aws-connector-4.2.0
>
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-05 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-33021:
---

Assignee: Hong Liang Teoh

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Assignee: Hong Liang Teoh
>Priority: Blocker
> Fix For: aws-connector-4.2.0
>
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-05 Thread Hong Liang Teoh (Jira)


[ https://issues.apache.org/jira/browse/FLINK-33021 ]


Hong Liang Teoh deleted comment on FLINK-33021:
-

was (Author: hong):
Triggering the nightly build now. 
https://github.com/apache/flink-connector-aws/actions/runs/6067488560

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Priority: Blocker
> Fix For: aws-connector-4.2.0
>
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-05 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762040#comment-17762040
 ] 

Hong Liang Teoh commented on FLINK-33021:
-

Latest nightly build has passed. Will resolve this JIRA

https://github.com/apache/flink-connector-aws/actions/runs/6078564261

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Priority: Blocker
> Fix For: aws-connector-4.2.0
>
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2023-09-05 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762039#comment-17762039
 ] 

Hong Liang Teoh commented on FLINK-28513:
-

merged commit 
[{{2437cf5}}|https://github.com/apache/flink/commit/2437cf568785a05ece70fde9f917637731740e46]
 into apache:release-1.18

> Flink Table API CSV streaming sink throws SerializedThrowable exception
> ---
>
> Key: FLINK-28513
> URL: https://issues.apache.org/jira/browse/FLINK-28513
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Table SQL / API
>Affects Versions: 1.15.1
>Reporter: Jaya Ananthram
>Assignee: Samrat Deb
>Priority: Critical
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0, 1.17.2, 1.19.0
>
>
> Table API S3 streaming sink (CSV format) throws the following exception,
> {code:java}
> Caused by: org.apache.flink.util.SerializedThrowable: 
> S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to 
> create a persistent recoverable intermediate point.
> at 
> org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:110) 
> ~[flink-csv-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:642)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:64)
>  ~[flink-file-sink-common-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:263)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:305)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:277)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:270)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:261)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.snapshotState(StreamingFileSinkHelper.java:87)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.stream.AbstractStreamingWriter.snapshotState(AbstractStreamingWriter.java:129)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> {code}
> In my table config, I am trying to read from Kafka and write to S3 (s3a) 
> using table API and checkpoint configuration using s3p (presto). Even I tried 
> with a simple datagen example instead of Kafka with local file system as 
> checkpointing (`file:///` instead of `s3p://`) and I am getting the same 
> issue. Exactly it is fails when the code triggers the checkpoint.
> Some related slack conversation and SO conversation 
> [here|https://apache-flink.slack.com/archives/C03G7LJTS2G/p1657609776339389], 
> [here|https://stackoverflow.com/questions/62138635/flink-streaming-compression-not-working-using-amazon-aws-s3-connector-streaming]
>  and 
> [here|https://stackoverflow.com/questions/72943730/flink-table-api-streaming-s3-sink-throws-serializedthrowable-exception]
> Since there is no work around for S3 table API streaming sink, I am marking 
> this as critical. if this is not a relevant severity, feel free to reduce the 
> priority. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-04 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-33021:

Fix Version/s: aws-connector-4.2.0

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Priority: Blocker
> Fix For: aws-connector-4.2.0
>
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761861#comment-17761861
 ] 

Hong Liang Teoh commented on FLINK-33021:
-

Triggering the nightly build now. 
https://github.com/apache/flink-connector-aws/actions/runs/6067488560

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Priority: Blocker
> Fix For: aws-connector-4.2.0
>
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761860#comment-17761860
 ] 

Hong Liang Teoh commented on FLINK-33021:
-

Ah. I only saw this now :( But I'll link the PR for reference!

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Priority: Blocker
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2023-09-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761856#comment-17761856
 ] 

Hong Liang Teoh edited comment on FLINK-28513 at 9/4/23 3:17 PM:
-

{quote}I dont have access to update the fix version . Please help updating the 
`Fix Version/s` for this issue
{quote}
Updated


was (Author: hong):
> I dont have access to update the fix version . Please help updating the `Fix 
> Version/s` for this issue
 
Updated

> Flink Table API CSV streaming sink throws SerializedThrowable exception
> ---
>
> Key: FLINK-28513
> URL: https://issues.apache.org/jira/browse/FLINK-28513
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Table SQL / API
>Affects Versions: 1.15.1
>Reporter: Jaya Ananthram
>Assignee: Samrat Deb
>Priority: Critical
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0, 1.17.2, 1.19.0
>
>
> Table API S3 streaming sink (CSV format) throws the following exception,
> {code:java}
> Caused by: org.apache.flink.util.SerializedThrowable: 
> S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to 
> create a persistent recoverable intermediate point.
> at 
> org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:110) 
> ~[flink-csv-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:642)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:64)
>  ~[flink-file-sink-common-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:263)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:305)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:277)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:270)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:261)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.snapshotState(StreamingFileSinkHelper.java:87)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.stream.AbstractStreamingWriter.snapshotState(AbstractStreamingWriter.java:129)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> {code}
> In my table config, I am trying to read from Kafka and write to S3 (s3a) 
> using table API and checkpoint configuration using s3p (presto). Even I tried 
> with a simple datagen example instead of Kafka with local file system as 
> checkpointing (`file:///` instead of `s3p://`) and I am getting the same 
> issue. Exactly it is fails when the code triggers the checkpoint.
> Some related slack conversation and SO conversation 
> [here|https://apache-flink.slack.com/archives/C03G7LJTS2G/p1657609776339389], 
> [here|https://stackoverflow.com/questions/62138635/flink-streaming-compression-not-working-using-amazon-aws-s3-connector-streaming]
>  and 
> [here|https://stackoverflow.com/questions/72943730/flink-table-api-streaming-s3-sink-throws-serializedthrowable-exception]
> Since there is no work around for S3 table API streaming sink, I am marking 
> this as critical. if this is not a relevant severity, feel free to reduce the 
> priority. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2023-09-04 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-28513:

Fix Version/s: 1.18.0
   1.17.2

> Flink Table API CSV streaming sink throws SerializedThrowable exception
> ---
>
> Key: FLINK-28513
> URL: https://issues.apache.org/jira/browse/FLINK-28513
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Table SQL / API
>Affects Versions: 1.15.1
>Reporter: Jaya Ananthram
>Assignee: Samrat Deb
>Priority: Critical
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0, 1.17.2, 1.19.0
>
>
> Table API S3 streaming sink (CSV format) throws the following exception,
> {code:java}
> Caused by: org.apache.flink.util.SerializedThrowable: 
> S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to 
> create a persistent recoverable intermediate point.
> at 
> org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:110) 
> ~[flink-csv-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:642)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:64)
>  ~[flink-file-sink-common-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:263)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:305)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:277)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:270)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:261)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.snapshotState(StreamingFileSinkHelper.java:87)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.stream.AbstractStreamingWriter.snapshotState(AbstractStreamingWriter.java:129)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> {code}
> In my table config, I am trying to read from Kafka and write to S3 (s3a) 
> using table API and checkpoint configuration using s3p (presto). Even I tried 
> with a simple datagen example instead of Kafka with local file system as 
> checkpointing (`file:///` instead of `s3p://`) and I am getting the same 
> issue. Exactly it is fails when the code triggers the checkpoint.
> Some related slack conversation and SO conversation 
> [here|https://apache-flink.slack.com/archives/C03G7LJTS2G/p1657609776339389], 
> [here|https://stackoverflow.com/questions/62138635/flink-streaming-compression-not-working-using-amazon-aws-s3-connector-streaming]
>  and 
> [here|https://stackoverflow.com/questions/72943730/flink-table-api-streaming-s3-sink-throws-serializedthrowable-exception]
> Since there is no work around for S3 table API streaming sink, I am marking 
> this as critical. if this is not a relevant severity, feel free to reduce the 
> priority. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2023-09-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761856#comment-17761856
 ] 

Hong Liang Teoh commented on FLINK-28513:
-

> I dont have access to update the fix version . Please help updating the `Fix 
> Version/s` for this issue
 
Updated

> Flink Table API CSV streaming sink throws SerializedThrowable exception
> ---
>
> Key: FLINK-28513
> URL: https://issues.apache.org/jira/browse/FLINK-28513
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Table SQL / API
>Affects Versions: 1.15.1
>Reporter: Jaya Ananthram
>Assignee: Samrat Deb
>Priority: Critical
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0, 1.17.2, 1.19.0
>
>
> Table API S3 streaming sink (CSV format) throws the following exception,
> {code:java}
> Caused by: org.apache.flink.util.SerializedThrowable: 
> S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to 
> create a persistent recoverable intermediate point.
> at 
> org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:110) 
> ~[flink-csv-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:642)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:64)
>  ~[flink-file-sink-common-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:263)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:305)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:277)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:270)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:261)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.snapshotState(StreamingFileSinkHelper.java:87)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.stream.AbstractStreamingWriter.snapshotState(AbstractStreamingWriter.java:129)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> {code}
> In my table config, I am trying to read from Kafka and write to S3 (s3a) 
> using table API and checkpoint configuration using s3p (presto). Even I tried 
> with a simple datagen example instead of Kafka with local file system as 
> checkpointing (`file:///` instead of `s3p://`) and I am getting the same 
> issue. Exactly it is fails when the code triggers the checkpoint.
> Some related slack conversation and SO conversation 
> [here|https://apache-flink.slack.com/archives/C03G7LJTS2G/p1657609776339389], 
> [here|https://stackoverflow.com/questions/62138635/flink-streaming-compression-not-working-using-amazon-aws-s3-connector-streaming]
>  and 
> [here|https://stackoverflow.com/questions/72943730/flink-table-api-streaming-s3-sink-throws-serializedthrowable-exception]
> Since there is no work around for S3 table API streaming sink, I am marking 
> this as critical. if this is not a relevant severity, feel free to reduce the 
> priority. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2023-09-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761855#comment-17761855
 ] 

Hong Liang Teoh commented on FLINK-28513:
-

merged commit 
[{{d06a297}}|https://github.com/apache/flink/commit/d06a297422fd4884aa21655fdf1f1bce94cc3a8a]
 into apache:release-1.17

> Flink Table API CSV streaming sink throws SerializedThrowable exception
> ---
>
> Key: FLINK-28513
> URL: https://issues.apache.org/jira/browse/FLINK-28513
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Table SQL / API
>Affects Versions: 1.15.1
>Reporter: Jaya Ananthram
>Assignee: Samrat Deb
>Priority: Critical
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.19.0
>
>
> Table API S3 streaming sink (CSV format) throws the following exception,
> {code:java}
> Caused by: org.apache.flink.util.SerializedThrowable: 
> S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to 
> create a persistent recoverable intermediate point.
> at 
> org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:110) 
> ~[flink-csv-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:642)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:64)
>  ~[flink-file-sink-common-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:263)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:305)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:277)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:270)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:261)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.snapshotState(StreamingFileSinkHelper.java:87)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.stream.AbstractStreamingWriter.snapshotState(AbstractStreamingWriter.java:129)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> {code}
> In my table config, I am trying to read from Kafka and write to S3 (s3a) 
> using table API and checkpoint configuration using s3p (presto). Even I tried 
> with a simple datagen example instead of Kafka with local file system as 
> checkpointing (`file:///` instead of `s3p://`) and I am getting the same 
> issue. Exactly it is fails when the code triggers the checkpoint.
> Some related slack conversation and SO conversation 
> [here|https://apache-flink.slack.com/archives/C03G7LJTS2G/p1657609776339389], 
> [here|https://stackoverflow.com/questions/62138635/flink-streaming-compression-not-working-using-amazon-aws-s3-connector-streaming]
>  and 
> [here|https://stackoverflow.com/questions/72943730/flink-table-api-streaming-s3-sink-throws-serializedthrowable-exception]
> Since there is no work around for S3 table API streaming sink, I am marking 
> this as critical. if this is not a relevant severity, feel free to reduce the 
> priority. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2023-09-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761787#comment-17761787
 ] 

Hong Liang Teoh commented on FLINK-28513:
-

[~samrat007]  Could we backport this bugfix to Flink 1.17 and 1.18 as well?

> Flink Table API CSV streaming sink throws SerializedThrowable exception
> ---
>
> Key: FLINK-28513
> URL: https://issues.apache.org/jira/browse/FLINK-28513
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Table SQL / API
>Affects Versions: 1.15.1
>Reporter: Jaya Ananthram
>Assignee: Samrat Deb
>Priority: Critical
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.19.0
>
>
> Table API S3 streaming sink (CSV format) throws the following exception,
> {code:java}
> Caused by: org.apache.flink.util.SerializedThrowable: 
> S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to 
> create a persistent recoverable intermediate point.
> at 
> org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:110) 
> ~[flink-csv-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:642)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:64)
>  ~[flink-file-sink-common-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:263)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:305)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:277)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:270)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:261)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.snapshotState(StreamingFileSinkHelper.java:87)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.stream.AbstractStreamingWriter.snapshotState(AbstractStreamingWriter.java:129)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> {code}
> In my table config, I am trying to read from Kafka and write to S3 (s3a) 
> using table API and checkpoint configuration using s3p (presto). Even I tried 
> with a simple datagen example instead of Kafka with local file system as 
> checkpointing (`file:///` instead of `s3p://`) and I am getting the same 
> issue. Exactly it is fails when the code triggers the checkpoint.
> Some related slack conversation and SO conversation 
> [here|https://apache-flink.slack.com/archives/C03G7LJTS2G/p1657609776339389], 
> [here|https://stackoverflow.com/questions/62138635/flink-streaming-compression-not-working-using-amazon-aws-s3-connector-streaming]
>  and 
> [here|https://stackoverflow.com/questions/72943730/flink-table-api-streaming-s3-sink-throws-serializedthrowable-exception]
> Since there is no work around for S3 table API streaming sink, I am marking 
> this as critical. if this is not a relevant severity, feel free to reduce the 
> priority. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2023-09-04 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-28513.
-
Fix Version/s: 1.19.0
   Resolution: Fixed

> Flink Table API CSV streaming sink throws SerializedThrowable exception
> ---
>
> Key: FLINK-28513
> URL: https://issues.apache.org/jira/browse/FLINK-28513
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Table SQL / API
>Affects Versions: 1.15.1
>Reporter: Jaya Ananthram
>Assignee: Samrat Deb
>Priority: Critical
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.19.0
>
>
> Table API S3 streaming sink (CSV format) throws the following exception,
> {code:java}
> Caused by: org.apache.flink.util.SerializedThrowable: 
> S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to 
> create a persistent recoverable intermediate point.
> at 
> org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:110) 
> ~[flink-csv-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:642)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:64)
>  ~[flink-file-sink-common-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:263)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:305)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:277)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:270)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:261)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.snapshotState(StreamingFileSinkHelper.java:87)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.stream.AbstractStreamingWriter.snapshotState(AbstractStreamingWriter.java:129)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> {code}
> In my table config, I am trying to read from Kafka and write to S3 (s3a) 
> using table API and checkpoint configuration using s3p (presto). Even I tried 
> with a simple datagen example instead of Kafka with local file system as 
> checkpointing (`file:///` instead of `s3p://`) and I am getting the same 
> issue. Exactly it is fails when the code triggers the checkpoint.
> Some related slack conversation and SO conversation 
> [here|https://apache-flink.slack.com/archives/C03G7LJTS2G/p1657609776339389], 
> [here|https://stackoverflow.com/questions/62138635/flink-streaming-compression-not-working-using-amazon-aws-s3-connector-streaming]
>  and 
> [here|https://stackoverflow.com/questions/72943730/flink-table-api-streaming-s3-sink-throws-serializedthrowable-exception]
> Since there is no work around for S3 table API streaming sink, I am marking 
> this as critical. if this is not a relevant severity, feel free to reduce the 
> priority. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2023-09-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761785#comment-17761785
 ] 

Hong Liang Teoh commented on FLINK-28513:
-

 merged commit 
[{{e921489}}|https://github.com/apache/flink/commit/e921489279ca70b179521ec4619514725b061491]
 into apache:master

> Flink Table API CSV streaming sink throws SerializedThrowable exception
> ---
>
> Key: FLINK-28513
> URL: https://issues.apache.org/jira/browse/FLINK-28513
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Table SQL / API
>Affects Versions: 1.15.1
>Reporter: Jaya Ananthram
>Assignee: Samrat Deb
>Priority: Critical
>  Labels: pull-request-available, stale-assigned
>
> Table API S3 streaming sink (CSV format) throws the following exception,
> {code:java}
> Caused by: org.apache.flink.util.SerializedThrowable: 
> S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to 
> create a persistent recoverable intermediate point.
> at 
> org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
>  ~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
> at org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:110) 
> ~[flink-csv-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:642)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:64)
>  ~[flink-file-sink-common-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:263)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:305)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:277)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:270)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:261)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.snapshotState(StreamingFileSinkHelper.java:87)
>  ~[flink-streaming-java-1.15.1.jar:1.15.1]
> at 
> org.apache.flink.connector.file.table.stream.AbstractStreamingWriter.snapshotState(AbstractStreamingWriter.java:129)
>  ~[flink-connector-files-1.15.1.jar:1.15.1]
> {code}
> In my table config, I am trying to read from Kafka and write to S3 (s3a) 
> using table API and checkpoint configuration using s3p (presto). Even I tried 
> with a simple datagen example instead of Kafka with local file system as 
> checkpointing (`file:///` instead of `s3p://`) and I am getting the same 
> issue. Exactly it is fails when the code triggers the checkpoint.
> Some related slack conversation and SO conversation 
> [here|https://apache-flink.slack.com/archives/C03G7LJTS2G/p1657609776339389], 
> [here|https://stackoverflow.com/questions/62138635/flink-streaming-compression-not-working-using-amazon-aws-s3-connector-streaming]
>  and 
> [here|https://stackoverflow.com/questions/72943730/flink-table-api-streaming-s3-sink-throws-serializedthrowable-exception]
> Since there is no work around for S3 table API streaming sink, I am marking 
> this as critical. if this is not a relevant severity, feel free to reduce the 
> priority. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761740#comment-17761740
 ] 

Hong Liang Teoh edited comment on FLINK-33021 at 9/4/23 9:01 AM:
-

This has been fixed here! 

https://github.com/apache/flink-connector-aws/pull/92


was (Author: JIRAUSER292614):
This has been fixed here! 

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Priority: Blocker
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33021) AWS nightly builds fails on architecture tests

2023-09-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761740#comment-17761740
 ] 

Hong Liang Teoh commented on FLINK-33021:
-

This has been fixed here! 

> AWS nightly builds fails on architecture tests
> --
>
> Key: FLINK-33021
> URL: https://issues.apache.org/jira/browse/FLINK-33021
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / AWS
>Affects Versions: aws-connector-4.2.0
>Reporter: Martijn Visser
>Priority: Blocker
>
> https://github.com/apache/flink-connector-aws/actions/runs/6067488560/job/16459208589#step:9:879
> {code:java}
> Error:  Failures: 
> Error:Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests 
> should use a MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase does not 
> satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> [INFO] 
> Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32757) Update the copyright year in NOTICE files

2023-08-04 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-32757:
---

Assignee: Jiabao Sun

> Update the copyright year in NOTICE files
> -
>
> Key: FLINK-32757
> URL: https://issues.apache.org/jira/browse/FLINK-32757
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / MongoDB
>Affects Versions: mongodb-1.0.0, mongodb-1.0.1
>Reporter: Jiabao Sun
>Assignee: Jiabao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: mongodb-1.0.2
>
>
> The current copyright year is 2014-2022 in NOTICE files. We should change it 
> to 2014-2023.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32757) Update the copyright year in NOTICE files

2023-08-04 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-32757.
-
Resolution: Fixed

> Update the copyright year in NOTICE files
> -
>
> Key: FLINK-32757
> URL: https://issues.apache.org/jira/browse/FLINK-32757
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / MongoDB
>Affects Versions: mongodb-1.0.0, mongodb-1.0.1
>Reporter: Jiabao Sun
>Assignee: Jiabao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: mongodb-1.0.2
>
>
> The current copyright year is 2014-2022 in NOTICE files. We should change it 
> to 2014-2023.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32757) Update the copyright year in NOTICE files

2023-08-04 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751227#comment-17751227
 ] 

Hong Liang Teoh commented on FLINK-32757:
-

Merged commit 
[{{fc656c4}}|https://github.com/apache/flink-connector-mongodb/commit/fc656c420e9b20676bf5e67c0c1c059a5ad44216]
 into apache:main

> Update the copyright year in NOTICE files
> -
>
> Key: FLINK-32757
> URL: https://issues.apache.org/jira/browse/FLINK-32757
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / MongoDB
>Affects Versions: mongodb-1.0.0, mongodb-1.0.1
>Reporter: Jiabao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: mongodb-1.0.2
>
>
> The current copyright year is 2014-2022 in NOTICE files. We should change it 
> to 2014-2023.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32703) [hotfix] flink-python POM has a typo for protobuf-java in shading config

2023-07-27 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-32703:
---

 Summary: [hotfix] flink-python POM has a typo for protobuf-java in 
shading config
 Key: FLINK-32703
 URL: https://issues.apache.org/jira/browse/FLINK-32703
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.17.1, 1.16.2, 1.18.0
Reporter: Hong Liang Teoh
 Fix For: 1.18.0


Fix typo. `inculde` -> `include`

 

 
{code:java}
                                
                                    net.razorvine:*
                                    net.sf.py4j:*
                                    org.apache.beam:*
                                    
com.fasterxml.jackson.core:*
                                    joda-time:*
                                    com.google.protobuf:*
                                    org.apache.arrow:*
                                    io.netty:*
                                    com.google.flatbuffers:*
                                    com.alibaba:pemja
                                 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32619) ConfigOptions to support fallback configuration

2023-07-19 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744606#comment-17744606
 ] 

Hong Liang Teoh commented on FLINK-32619:
-

That's a great callout [~wangm92] . Will use that instead and close this JIRA

> ConfigOptions to support fallback configuration
> ---
>
> Key: FLINK-32619
> URL: https://issues.apache.org/jira/browse/FLINK-32619
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Configuration
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Hong Liang Teoh
>Priority: Minor
>
> ConfigOptions has no option to specify a "fallback configuration" as the 
> default.
>  
> For example, if we want {{rest.cache.checkpoint-statistics.timeout}} to 
> fallback to web.refresh-interval instead of a static default value, we have 
> to specify
>  
> {code:java}
> @Documentation.OverrideDefault("web.refresh-interval")
> @Documentation.Section(Documentation.Sections.EXPERT_REST)
> public static final ConfigOption 
> CACHE_CHECKPOINT_STATISTICS_TIMEOUT =
> key("rest.cache.checkpoint-statistics.timeout")
> .durationType()
> .noDefaultValue()
> .withDescription(
> "");
>  {code}
>  
>  
> The {{.noDefault()}} is misleading as it actually has a default.
>  
> We should introduce a {{.fallbackConfiguration()}} that is handled gracefully 
> by doc generators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32619) ConfigOptions to support fallback configuration

2023-07-19 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-32619.
-
Resolution: Not A Problem

> ConfigOptions to support fallback configuration
> ---
>
> Key: FLINK-32619
> URL: https://issues.apache.org/jira/browse/FLINK-32619
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Configuration
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Hong Liang Teoh
>Priority: Minor
>
> ConfigOptions has no option to specify a "fallback configuration" as the 
> default.
>  
> For example, if we want {{rest.cache.checkpoint-statistics.timeout}} to 
> fallback to web.refresh-interval instead of a static default value, we have 
> to specify
>  
> {code:java}
> @Documentation.OverrideDefault("web.refresh-interval")
> @Documentation.Section(Documentation.Sections.EXPERT_REST)
> public static final ConfigOption 
> CACHE_CHECKPOINT_STATISTICS_TIMEOUT =
> key("rest.cache.checkpoint-statistics.timeout")
> .durationType()
> .noDefaultValue()
> .withDescription(
> "");
>  {code}
>  
>  
> The {{.noDefault()}} is misleading as it actually has a default.
>  
> We should introduce a {{.fallbackConfiguration()}} that is handled gracefully 
> by doc generators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32619) ConfigOptions to support fallback configuration

2023-07-18 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-32619:
---

 Summary: ConfigOptions to support fallback configuration
 Key: FLINK-32619
 URL: https://issues.apache.org/jira/browse/FLINK-32619
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Configuration
Affects Versions: 1.17.1, 1.16.2
Reporter: Hong Liang Teoh


ConfigOptions has no option to specify a "fallback configuration" as the 
default.

 

For example, if we want {{rest.cache.checkpoint-statistics.timeout}} to 
fallback to web.refresh-interval instead of a static default value, we have to 
specify

 
{code:java}
@Documentation.OverrideDefault("web.refresh-interval")
@Documentation.Section(Documentation.Sections.EXPERT_REST)
public static final ConfigOption CACHE_CHECKPOINT_STATISTICS_TIMEOUT =
key("rest.cache.checkpoint-statistics.timeout")
.durationType()
.noDefaultValue()
.withDescription(
"");
 {code}
 

 

The {{.noDefault()}} is misleading as it actually has a default.

 

We should introduce a {{.fallbackConfiguration()}} that is handled gracefully 
by doc generators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32508) Flink-Metrics Prometheus - Native Histograms / Native Counters

2023-07-07 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740995#comment-17740995
 ] 

Hong Liang Teoh commented on FLINK-32508:
-

[~martijnvisser]  Sounds great! Always happy to collaborate :D There was some 
pretty good suggestions on the mailing list thread! 
https://lists.apache.org/thread/kbo3973whb8nj5xvkpvhxrmgtmnbkhlv

> Flink-Metrics Prometheus - Native Histograms / Native Counters
> --
>
> Key: FLINK-32508
> URL: https://issues.apache.org/jira/browse/FLINK-32508
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Metrics
>Reporter: Ryan van Huuksloot
>Assignee: Hong Liang Teoh
>Priority: Major
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> There are new metric types in Prometheus that would allow for the exporter to 
> write Counters and Histograms as Native metrics in prometheus (vs writing as 
> Gauges). This requires an update to the Prometheus Client which has changed 
> it's spec.
> To accommodate the new metric types while retaining the old option for 
> prometheus metrics, the recommendation is to *Add a new package such as 
> `flink-metrics-prometheus-native` and eventually deprecate the original.*
> Discussed more on the mailing list: 
> https://lists.apache.org/thread/kbo3973whb8nj5xvkpvhxrmgtmnbkhlv



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32508) Flink-Metrics Prometheus - Native Histograms / Native Counters

2023-07-07 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740939#comment-17740939
 ] 

Hong Liang Teoh commented on FLINK-32508:
-

Happy to take a look at this, can I please be assigned this?

> Flink-Metrics Prometheus - Native Histograms / Native Counters
> --
>
> Key: FLINK-32508
> URL: https://issues.apache.org/jira/browse/FLINK-32508
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Metrics
>Reporter: Ryan van Huuksloot
>Priority: Minor
> Fix For: 1.18.0, 1.19.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> There are new metric types in Prometheus that would allow for the exporter to 
> write Counters and Histograms as Native metrics in prometheus (vs writing as 
> Gauges). This requires an update to the Prometheus Client which has changed 
> it's spec.
> To accommodate the new metric types while retaining the old option for 
> prometheus metrics, the recommendation is to *Add a new package such as 
> `flink-metrics-prometheus-native` and eventually deprecate the original.*
> Discussed more on the mailing list: 
> https://lists.apache.org/thread/kbo3973whb8nj5xvkpvhxrmgtmnbkhlv



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32537) Add compatibility annotation for REST API classes

2023-07-05 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740266#comment-17740266
 ] 

Hong Liang Teoh edited comment on FLINK-32537 at 7/5/23 3:42 PM:
-

> We already know that though due to the compatibility tests.

 

Got it. [~chesnay] Is the proposal to close this Jira and leave as-is?


was (Author: JIRAUSER292614):
> We already know that though due to the compatibility tests.

 

Got it. [~chesnay] Is the proposal to ignore the `@Internal` or `@Public` for 
REST API classes?

> Add compatibility annotation for REST API classes
> -
>
> Key: FLINK-32537
> URL: https://issues.apache.org/jira/browse/FLINK-32537
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / REST
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Hong Liang Teoh
>Priority: Minor
> Fix For: 1.18.0
>
>
> *Why*
> We want to standardise the class labelling for Flink classes. Currently, the 
> compatibility annotations like @Public, @PublicEvolving, @Internal are not 
> present for REST API classes.
>  
> *What*
> We should be added @Internal for most Flink classes, unless they change the 
> REST API variables, so we know clearly which components will change our REST 
> API when changed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32537) Add compatibility annotation for REST API classes

2023-07-05 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740266#comment-17740266
 ] 

Hong Liang Teoh commented on FLINK-32537:
-

> We already know that though due to the compatibility tests.

 

Got it. [~chesnay] Is the proposal to ignore the `@Internal` or `@Public` for 
REST API classes?

> Add compatibility annotation for REST API classes
> -
>
> Key: FLINK-32537
> URL: https://issues.apache.org/jira/browse/FLINK-32537
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / REST
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Hong Liang Teoh
>Priority: Minor
> Fix For: 1.18.0
>
>
> *Why*
> We want to standardise the class labelling for Flink classes. Currently, the 
> compatibility annotations like @Public, @PublicEvolving, @Internal are not 
> present for REST API classes.
>  
> *What*
> We should be added @Internal for most Flink classes, unless they change the 
> REST API variables, so we know clearly which components will change our REST 
> API when changed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32537) Add compatibility annotation for REST API classes

2023-07-04 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-32537:
---

 Summary: Add compatibility annotation for REST API classes
 Key: FLINK-32537
 URL: https://issues.apache.org/jira/browse/FLINK-32537
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / REST
Affects Versions: 1.17.1, 1.16.2
Reporter: Hong Liang Teoh
 Fix For: 1.18.0


*Why*

We want to standardise the class labelling for Flink classes. Currently, the 
compatibility annotations like @Public, @PublicEvolving, @Internal are not 
present for REST API classes.

 

*What*

We should be added @Internal for most Flink classes, unless they change the 
REST API variables, so we know clearly which components will change our REST 
API when changed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32535) CheckpointingStatisticsHandler periodically returns NullArgumentException after job restarts

2023-07-04 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-32535:

Description: 
*What*

When making requests to /checkpoints REST API after a job restart, we see 500 
for a short period of time. We should handle this gracefully in the 
CheckpointingStatisticsHandler.

 

*How to replicate*
 * Checkpointing interval 1s
 * Job is constantly restarting
 * Make constant requests to /checkpoints REST API.

See [here|https://github.com/apache/flink/pull/22901#issuecomment-1617830035] 
for more info

 

Stack trace:

{{org.apache.commons.math3.exception.NullArgumentException: input array}}
{{    at 
org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1753)}}
{{    at 
org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158)}}
{{    at 
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272)}}
{{    at 
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241)}}
{{    at 
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:159)}}
{{    at 
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:53)}}
{{    at 
org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108)}}
{{    at 
org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81)}}
{{    at 
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:133)}}
{{    at 
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleCheckpointStatsRequest(CheckpointingStatisticsHandler.java:85)}}
{{    at 
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleCheckpointStatsRequest(CheckpointingStatisticsHandler.java:59)}}
{{    at 
org.apache.flink.runtime.rest.handler.job.checkpoints.AbstractCheckpointStatsHandler.lambda$handleRequest$1(AbstractCheckpointStatsHandler.java:62)}}
{{    at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)}}
{{    at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)}}
{{    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)}}
{{    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)}}
{{    at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)}}
{{    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)}}
{{    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)}}
{{    at java.base/java.lang.Thread.run(Thread.java:829)\n}}

 

See graphs here for tests. The dips in the green line correspond to the 
failures immediately after a job restart.

!https://user-images.githubusercontent.com/35062175/250529297-908a6714-ea15-4aac-a7fc-332589da2582.png!

  was:
*What*

When making requests to /checkpoints REST API after a job restart, we see 500 
for a short period of time. We should handle this gracefully in the 
CheckpointingStatisticsHandler.

 

*How to replicate*
 * Checkpointing interval 1s
 * Job is constantly restarting
 * Make constant requests to /checkpoints REST API.

 

Stack trace:

{{org.apache.commons.math3.exception.NullArgumentException: input array}}
{{    at 
org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1753)}}
{{    at 
org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158)}}
{{    at 
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272)}}
{{    at 
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241)}}
{{    at 
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:159)}}
{{    at 
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:53)}}
{{    at 
org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108)}}
{{    at 
org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81)}}
{{    at 
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:133)}}
{{    at 

[jira] [Created] (FLINK-32535) CheckpointingStatisticsHandler periodically returns NullArgumentException after job restarts

2023-07-04 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-32535:
---

 Summary: CheckpointingStatisticsHandler periodically returns 
NullArgumentException after job restarts
 Key: FLINK-32535
 URL: https://issues.apache.org/jira/browse/FLINK-32535
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.17.1, 1.16.2
Reporter: Hong Liang Teoh
 Fix For: 1.18.0


*What*

When making requests to /checkpoints REST API after a job restart, we see 500 
for a short period of time. We should handle this gracefully in the 
CheckpointingStatisticsHandler.

 

*How to replicate*
 * Checkpointing interval 1s
 * Job is constantly restarting
 * Make constant requests to /checkpoints REST API.

 

Stack trace:

{{org.apache.commons.math3.exception.NullArgumentException: input array}}
{{    at 
org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1753)}}
{{    at 
org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158)}}
{{    at 
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272)}}
{{    at 
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241)}}
{{    at 
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:159)}}
{{    at 
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:53)}}
{{    at 
org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108)}}
{{    at 
org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81)}}
{{    at 
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:133)}}
{{    at 
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleCheckpointStatsRequest(CheckpointingStatisticsHandler.java:85)}}
{{    at 
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleCheckpointStatsRequest(CheckpointingStatisticsHandler.java:59)}}
{{    at 
org.apache.flink.runtime.rest.handler.job.checkpoints.AbstractCheckpointStatsHandler.lambda$handleRequest$1(AbstractCheckpointStatsHandler.java:62)}}
{{    at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)}}
{{    at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)}}
{{    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)}}
{{    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)}}
{{    at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)}}
{{    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)}}
{{    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)}}
{{    at java.base/java.lang.Thread.run(Thread.java:829)\n}}

 

See graphs here for tests. The dips in the green line correspond to the 
failures immediately after a job restart.

!https://user-images.githubusercontent.com/35062175/250529297-908a6714-ea15-4aac-a7fc-332589da2582.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >